1 

Noncoherent Capacity of Underspread Fading 

Channels 

Giuseppe Durisi, Member, IEEE, Ulrich G. Schuster, Student Member, IEEE, 
Helmut Bolcskei, Senior Member, IEEE, Shlomo Shamai (Shitz), Fellow, IEEE 

Abstract 

We derive bounds on the noncoherent capacity of wide-sense stationary uncorrelated scattering (WSSUS) 
channels that are selective both in time and frequency, and are underspread, i.e., the product of the channel's 
delay spread and Doppler spread is small. For input signals that are peak constrained in time and frequency, 
we obtain upper and lower bounds on capacity that are explicit in the channel's scattering function, are 
accurate for a large range of bandwidth and allow to coarsely identify the capacity-optimal bandwidth as a 
function of the peak power and the channel's scattering function. We also obtain a closed-form expression 
for the first-order Taylor series expansion of capacity in the limit of large bandwidth, and show that our 
bounds are tight in the wideband regime. For input signals that are peak constrained in time only (and, 
hence, allowed to be peaky in frequency), we provide upper and lower bounds on the infinite-bandwidth 
capacity and find cases when the bounds coincide and the infinite-bandwidth capacity is characterized 
exactly. Our lower bound is closely related to a result by Viterbi (1967). 

The analysis in this paper is based on a discrete-time discrete-frequency approximation of WSSUS 
time- and frequency-selective channels. This discretization explicitly takes into account the underspread 
property, which is satisfied by virtually all wireless communication channels. 

This work was supported in part by the Swiss Kommission fur Technologie und Innovation (KTI) under grant 6715.2 ENS-ES, 
and by the European Commission as part of the Integrated Project PULSERS Phase II under contract FP6-027142, and as part of the 
FP6 Network of Excellence NEWCOM. 

G. Durisi and H. Bolcskei are with the Communication Technology Laboratory, ETH Zurich, 8092 Zurich, Switzerland (e-mail: 
{gdurisi, boelcskei} @nari.ee.ethz.ch). 

U. G. Schuster was with the Communication Technology Laboratory, ETH Zurich, and is now with Celestrius AG, Zurich, 
Switzerland. 

S. Shamai (Shitz) is with Technion, Israel Institute of Technology, 32000 Haifa, Israel (e-mail: sshlomo@ee.technion.ac.il). 
This paper was presented in part at the IEEE International Symposium on Information Theory, Seattle, WA, U.S.A., July 2006, 
and at the IEEE International Symposium on Information Theory, Nice, France, June 2007. 



April 10, 2008 



DRAFT 



2 



I. Introduction and Outline 

1 ) Models for fading channels: Channel capacity is a benchmark for the design of any communi- 
cation system. The techniques used to compute, or at least to bound, channel capacity often provide 
guidelines for the design of practical systems, e.g., how to best utilize the resources bandwidth and 
power, and how to design efficient modulation and coding schemes [1, Sec. III.3]. Our goal in this 
paper is to analyze the capacity of wireless communication channels that are of direct practical 
importance. We believe that an accurate stochastic model for such channels should take the following 
aspects into account: 

• The channel is selective in time and frequency, i.e., it exhibits memory in frequency and in 
time, respectively. 

• Neither the transmitter nor the receiver knows the instantaneous realization of the channel. 

• The peak power of the input signal is limited. 

These aspects are important because they arise from practical limitations of real-world communica- 
tion systems: temporal variations of the environment and multipath propagation are responsible for 
channel selectivity in time and frequency, respectively [2], [3]; perfect channel knowledge at the 
receiver is impossible to obtain because channel state information needs to be extracted from the 
received signal; finally, realizable transmitters are always limited in their peak output power [4]. The 
above aspects are also fundamental as they significantly impact the behavior of channel capacity: for 
example, the capacity of a block-fading channel behaves differently from the capacity of a channel 
that is stationary in time [5]; channel capacity with perfect channel knowledge at the receiver is 
always larger than the capacity without channel knowledge [6], and the signaling schemes necessary 
to achieve capacity are also very different in the two cases [1]; finally, a peak constraint on the 
transmit signal can lead to vanishing capacity in the large-bandwidth limit [7]-[9], while without a 
peak constraint the infinite-bandwidth AWGN capacity can be attained asymptotically [7], [10]— [15]. 

Small scale fading of wireless channels can be sensibly modeled as a stochastic Gaussian linear 
time- varying (LTV) system [2]; in particular, we base our developments on the widely used wide- 
sense stationary uncorrelated scattering (WSSUS) model for random LTV channels [16], [12]. Like 
most models for real-world channels, the WSSUS model is time continuous; however, almost all 
tools for information-theoretic analysis of noisy channels require a discretized representation of the 
channel's input-output relation. Several approaches to discretize random LTV channels are proposed 
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in the literature, e.g., sampling [8], [16], [17] or basis expansion [18], [19]; all these discretized 
models incur an approximation error with respect to the continuous-time WSSUS model that is often 
difficult to quantify. As virtually all wireless channels of practical interest are underspread, i.e., the 
product of maximum delay and maximum Doppler shift is small, we build our information-theoretic 
analysis upon a discretization of LTV channels, proposed by Kozek [20], that explicitly takes into 
account the underspread property to minimize the approximation error in the mean-square sense. 

2) Capacity of noncoherent WSSUS channels: Throughout the paper, we assume that both the 
transmitter and receiver know the channel law^jbut both are ignorant of the channel realization, a 
setting often called noncoherent. In the following, we refer to channel capacity in the noncoherent 
setting simply as "capacity". In contrast, in the coherent setting the receiver is also assumed to know 
the channel realization perfectly; the corresponding capacity is termed coherent capacity. 

A general closed-form expression for the capacity of Rayleigh-fading channels is not known, 
even if the channel is memoryless [22]. However, several asymptotic results are available. If only a 
constraint on the average transmitted power is imposed, the AWGN capacity can be achieved in 
the infinite-bandwidth limit also in the presence of fading. This result is quite robust, as it holds 
for a wide variety of channel models [7], [10]— [15]. Verdu showed that flash signaling, which 
implies unbounded peak power of the input signal, is necessary and sufficient to achieve the infinite- 
bandwidth AWGN capacity on block-memoryless fading channels [14]; a form of flash signaling is 
also infinite-bandwidth optimal for the more general time- and frequency-selective channel model 
used in the present paper [15]. In contrast, if the peakiness of the input signal is restricted, the 
infinite-bandwidth capacity behavior of most fading channels changes drastically, and the limit 
depends on the type of peak constraint imposed [7]-[9], [13], [23]. In this paper, we shall distinguish 
between a peak constraint in time and a peak constraint in time and frequency. 

a) Peak constraint in time: No closed- form capacity expression, not even in the infinite- 
bandwidth limit, seems to exist to date for time- and frequency-selective WSSUS channels. Viterbi's 
analysis [23] provides a result that can be interpreted as a lower bound on the infinite -bandwidth 
capacity of time- and frequency-selective channels. This lower bound is in the form of the infinite- 
bandwidth AWGN capacity minus a penalty term that depends on the channel's power-Doppler 

'This implies that the codebook and the decoding strategy can be optimized accordingly [21]. 
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profile [16]. For channels that are time selective but frequency flat, structurally similar expressions 
were found for the infinite-bandwidth capacity [24], [25] and for the capacity per unit energy [26]. 

b) Peak constraint in time and frequency: Although a closed-form capacity expression valid 
for all bandwidths is not available, it is known that the infinite -bandwidth capacity is zero for 
various channel models [7]-[9]. This asymptotic capacity behavior implies that signaling schemes 
that spread the transmit energy uniformly across time and frequency perform poorly in the large- 
bandwidth regime. Even more useful for performance assessment would be capacity bounds for 
finite bandwidth. For frequency-flat time- selective channels, such bounds can be found in [27], 
[28], while for the more general time- and frequency- selective case treated in the present paper, 
upper bounds seem to exist only on the rates achievable with particular signaling schemes, namely 
for orthogonal frequency-division multiplexing (OFDM) with constant- modulus symbols [29], 
and for multiple-input multiple-output (MIMO) OFDM with unitary space-frequency codes over 
frequency-selective block-fading channels [30]. 

3) Contributions: We use the discrete-time discrete-frequency approximation of continuous-time 
underspread WSSUS channels proposed in [20], to obtain the following results: 

• We derive upper and lower bounds on capacity under a constraint on the average power and 
under a peak constraint in both time and frequency. These bounds are valid for any bandwidth, 
are explicit in the channel's scattering function, and generalize the results on achievable rates 
in [29]. In particular, our bounds allow to coarsely identify the capacity-optimal bandwidth 
for a given peak constraint and a given scattering function. 

• Under the same peak constraint in time and frequency, we find the first-order Taylor series 
expansion of channel capacity in the limit of infinite bandwidth. This result extends the 
asymptotic capacity analysis for frequency-flat time- selective channels in [28] to channels that 
are selective in both time and frequency. 

• In the infinite-bandwidth limit and for transmit signals that are peak-constrained in time only, 
we recover Viterbi's capacity lower bound [23]. In addition, we derive an upper bound that is 
shown to coincide with the lower bound for a specific class of channels; hence, the infinite- 
bandwidth capacity for this class of channels is established. 

The results in this paper rely on several flavors of Szego's theorem on the asymptotic eigenvalue 
distribution of Toeplitz matrices [31], [32]; in particular, we use various extensions of Szego's 
theorem to two-level Toeplitz matrices, i.e., block- Toeplitz matrices that have Toeplitz blocks [33], 
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[34] . Another key ingredient for several of our proofs is the relation between mutual information 
and minimum mean-square error (MMSE) discovered recently by Guo et al. [35]. Furthermore, we 
use a property of the information divergence of orthogonal signaling schemes derived by Butman 
and Klass [36]. 

4) Notation: Uppercase boldface letters denote matrices and lowercase boldface letters designate 
vectors. The superscripts T , *, and H stand for transposition, element-wise conjugation, and Hermitian 
transposition, respectively. For two matrices A and B of appropriate dimensions, the Hadamard 
product is denoted as A B. We designate the identity matrix of dimension N x N as I at and 
the all-zero vector of appropriate dimension as 0. We let diag(x) denote a diagonal square matrix 
whose main diagonal contains the elements of the vector x. The determinant, trace, and rank of the 
matrix X are denoted as det(X), tr(X), and rank(X), respectively, and Aj(X) is the ith eigenvalue 
of a square matrix X. The function 8{x) is the Dirac distribution, and S[n] is defined as 5[0] = 1 
and 5[n] =0 for all n^O. All logarithms are to the base e. The real part of the complex number z is 
denoted $t{z}. We write A—B for the set difference between the sets A and B. For two functions f(x) 
and g(x), the notation f(x) = o(g(x)) for x — > means that lim^o f(x)/g(x) = 0. With \x\ we 
denote the largest integer smaller or equal to x G K. A signal is an element of the Hilbert space C 2 
of square integrable functions. The inner product between two signals f(x) and g(x) is denoted 
as (/) 9) — jT^ f( x )9* {x)dx. For a random variable (RV) x with distribution Q x , we write x ~ Q x . 
We denote expectation by E[-], and use the notation E x [-] to stress that the expectation is taken 
with respect to the RV x. We write D(Q x \\Q y ) for the Kullback-Leibler (KL) divergence between 
the two distributions Q x and Q y . Finally, CA/"(m, R) stands for the distribution of a jointly proper 
Gaussian (JPG) random vector with mean m and covariance matrix R. 

II. Channel and System Model 

A channel model needs to strike a balance between generality, accuracy, engineering relevance, 
and mathematical tractability. In the following, we start from the classical WSSUS model for 
LTV channels [16], [12] because it is a fairly general, yet accurate and mathematically tractable 
model that is widely used. This model has a continuous-time input-output relation, which is difficult 
to use as a basis for information-theoretic studies. However, if the channel is underspread it is 
possible to closely approximate the original WSSUS input-output relation by a discretized input- 
output relation that is especially suited for the derivation of capacity bounds. In particular, the bounds 
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we derive in this paper can be directly related to the underlying continuous-time WSSUS channel 
as they are explicit in its scattering function. 

A. Time- and Frequency-Selective Underspread Fading Channels 

1 ) The channel operator: A wireless channel can be described as a linear operator EI : C? — > H m 
that maps an input signal x(t) into an output signal r(t) G 1Z m , where 7£ H C C? denotes the range 
space of EI [37]. The corresponding noise-free input-output relation is then r(t) = (Mx)(t). 

It is sensible to model wireless channels as random, for one because a deterministic description 
of the physical propagation environment is too complex in most cases of practical interest, and 
second because a stochastic description is much more robust, in the sense that systems designed on 
the basis of a stochastic channel model can be expected to work in a variety of different propagation 
environments [3]. Consequently, we assume that EI is a random operator. 

2) System functions: Because communication takes place over a finite bandwidth and a finite 
time duration, we can assume that each realization of EI is a Hilbert-Schmidt operator [38], [39]. 
Hence, the noise-free input-output relation of the LTV channel can be written asQ[38,p. 1083] 

r{t) = (Wx)(t) = / k m {t,t')x{t')dt' (1) 



v 

where the kernel km(t, t') can be interpreted as the channel response at time t to a Dirac impulse at 
time t' . Instead of two variables that denote absolute time, it is common in the engineering literature 
to use absolute time t and delay r. This leads to the time-varying impulse response h^(t, r) = 
ka(t, t — t) and the corresponding noise-free input-output relation [16] 

r{t) = J h m {t,r)x{t-r)dr. (2) 

r 

Two more system functions that will be important in the following developments are the time-varying 
transfer function^ 

Lm(tJ) = j h m (t,r)e-^ fT dr (3) 

T 

2 All integrals are from — oo to oo unless stated otherwise. 

3 As EI is of Hilbert-Schmidt type, the time-varying impulse response hu(t, r) is square integrable, and the Fourier transforms 
in |3} and |4j are well defined. 
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and the spreading function 

S m {u, t) = I h m (t, r)e-^ ut dt = I I L m {t, fie'^-^dtdf. (4) 



tf 



In particular, if we rewrite the input-output relation §2§ in terms of the spreading function S^(u, r) 



as 



r(t) = S M (u, r)x{t - T)e j2nty dTdu (5) 



we obtain an intuitive physical interpretation: the output signal r(t) is a weighted superposition 
of copies of the input signal x(t) that are shifted in time by the delay r and in frequency by the 
Doppler shift v. 

3) Stochastic characterization and WSSUS assumption: For mathematical tractability, we need 
to make additional assumptions on the system functions. First, we assume that Lu{t, f) is a zero- 
mean JPG random process in t and /. Indeed, the Gaussian distribution is empirically supported 
for narrowband channels [2], and even ultrawideband (UWB) channels with bandwidth up to 
several gigahertz can be modeled as Gaussian distributed [40]. By virtue of the Gaussian assump- 
tion, La(t, f) is completely characterized by its correlation function. Yet, this correlation function 
is four-dimensional in general and thus difficult to work with. A further simplification is possible if 
we assume that the channel process is wide-sense stationary in time t and uncorrelated in delay r, 
the so-called WSSUS assumption [16]. As a consequence, Lq%(t, f) is wide-sense stationary both 
in time t and frequency /, or, equivalently, Sh(V, 7") is uncorrelated in Doppler v and delay r [16]: 

E[L m (t, f)L* m (t', /')] = Rm(t -t'J- /') 
E[S m {v, t)S^(u', t')\ = CeO, r)S(u - u')S(t - r). 

The function Ru(t, f) is called the channel's (time-frequency) correlation function, and C^(u, r) is 
called the scattering function of the channel H. The two functions are related by a two-dimensional 
Fourier transform, 

Cm(v, r) = J J R B (t, f)e^ 2 < vt -^ dtdf . (6) 
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As Ru(t, f) is stationary in t and /, C^{u, r) is nonnegative and real- valued for all v and r, and 
can be interpreted as the spectrum of the channel process. The power-delay profile of EI is defined 
as 



The WSSUS assumption is widely used in wireless channel modeling [16], [12], [2], [1], [41], [42]. 
It is in good agreement with measurements of tropospheric scattering channels [12], and provides a 
reasonable model for many types of mobile radio channels [43]-[45], at least over a limited time 
duration and bandwidth [16]. Furthermore, the scattering function can be directly estimated from 
measured data [46], [47], so that capacity expressions and bounds that explicitly depend on the 
channel's scattering function can be evaluated for many channels of practical interest. 

Formally, the WSSUS assumption is mathematically incompatible with the requirement that EI is 
of Hilbert-Schmidt type, or, equivalently, that the system functions are square integrable, because sta- 
tionarity in time t and frequency/ of L^(t, f) implies that L H (i, /) cannot decay to zero for t — ► oo 
and / — > oo. Similarly to the engineering model of white noise, this incompatibility is a mathematical 
artifact and not a problem of real-world wireless channels: in fact, every communication system 
transmits over a finite time duration and over a finite bandwidth^] We believe that the simplification 
the WSSUS assumption entails justifies this mathematical inconsistency. 

B. The Under spread Assumption and its Consequences 

Because the velocity of the transmitter, of the receiver, and of the objects in the propagation 
environment is limited, so is the maximum Doppler shift Uq experienced by the transmitted signal. 
We also assume that the maximum delay is strictly smaller than 2r . For simplicity and without 
loss of generality, throughout this paper, we consider scattering functions that are centered at r = 
and v = 0, i.e., we remove any overall fixed delay and Doppler shift. The assumptions of limited 

4 A more detailed account on solutions to overcome the mathematical incompatibility between stationary and finite-energy models 
can be found in [48, Sec. 7.5]. 




V 



and the power-Doppler profile as 




T 
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Doppler shift and delay then imply that the scattering function is supported on a rectangle of 

spread Ae = 4z/ r , 

Cm(v, r) = for (u, r) £ [-i/ , i/„] x [-r , r ]. (7) 

Condition Q in turn implies that the spreading function Se(V, t) is also supported on the same 
rectangle with probability 1 (w.p.l). If Ae < 1, the channel is said to be underspread [16], [12], 
[20]. Virtually all channels in wireless communication are highly underspread, with Ae ~ 10~ 3 
for typical land-mobile channels and as low as 10 7 for some indoor channels with restricted 
mobility of the terminals [49]— [5 1] . The underspread property of typical wireless channels is very 
important, first because only (deterministic) underspread channels can be completely identified 
from measurements [52], [53], and second because underspread channels have a well- structured set 
of approximate eigenfunctions that can be used to discretize the channel operator, as described next. 

1 ) Approximate diagonalization of underspread channels: As EI is a Hilbert-Schmidt operator, its 
kernel can be expressed in terms of its positive singular values its left singular functions {ui(t)}, 
and its right singular functions {v i(t)} [37, Th. 6.14.1], according to 

oo 

MM') = Yl °Mt)v*{t'). (8) 

i=— oo 

We denote by Ae the null space of H, i.e., the space of input signals that the channel maps onto 0. 
The set {vi(t)} is an orthonormal basis for the linear span of C 2 — Ae, and {ui(t)} is an orthonormal 
basis for the range space 1Z m . Any input signal in Ae is of no utility for communication purposes; 
the remaining input signals in the linear span of C 2 — Ae, which we denote in the remainder of 
the paper as input space, can be completely characterized by their projections onto the set {vi(t)}. 
Similarly, the output signal r(t) = (Mx)(t) is completely described by its projections onto the 
set {ui(t)}. These projections together with the kernel decomposition ^ yield a countable set of 
scalar input-output relations, which we refer to as the diagonalization of HI. 

Because the right and left singular functions depend on the realization of HI, diagonalization 
requires perfect channel knowledge. But this knowledge is not available in the noncoherent setting. 
In contrast, if the singular functions of the random channel HI did not depend on its particular 
realization, we could diagonalize HI without knowledge of the channel realization. This is the case, 
for example, for random linear time-invariant (LTI) channels, where complex sinusoids are always 
eigenfunctions, independently of the realization of the channel's impulse response. Fortunately, the 
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singular functions of underspread random LTV channels can be well approximated by deterministic 
functions. More precisely, an underspread channel EI has the following properties [20]: 

1) All realizations of the underspread channel EI are approximately normal, so that the singular 
value decomposition ([8]) can be replaced by an eigenvalue decomposition. 

2) Any deterministic unit-energy signal g(t) that is well localized^ in time and frequency is 
an approximate eigenfunction of EI in the mean-square sense, i.e., the mean-square error 
E[|| (EI g, g)g — M g\\ 2 ] is small if EI is underspread. This error can be further reduced by an 
appropriate choice of g(t), where the choice depends on the scattering function C-a(y, r). 

3) If g(t) is an approximate eigenfunction as defined in the previous point, then so is gr a ,p){t) = 
g(t — a)e^ 2n/3t for any time shift a E E and any frequency shift @ G M. 

4) For any (a, (3), the time-varying transfer function L H (a, (3) is an approximate eigenvalue of EI 
corresponding to the approximate eigenfunction gi a $)(t), in the sense that the mean-square 
error E[\(Mg(_ a ^, g^) - L m (a, [3)\ 2 ] is small. 

We use these properties of underspread operators to construct an approximation EI of the random 
channel EI that has a well- structured set of deterministic eigenfunctions. The errors incurred by 
this approximation are discussed in detail in Appendix [Aj We then diagonalize this approximating 
operator and exclusively consider the corresponding discretized input-output relation in the reminder 
of the paper. Property 1, the approximate normality of EI, together with Property 2 implies that 
the kernel of the approximating operator EI can be synthesized as XlS-oo ^i z i(t) z i(t')i where, 
differently from ([8]), the A; are now random eigenvalues instead of random singular values, and 
the Zi(t) constitute a set of deterministic orthonormal eigenfunctions instead of random singular 
functions. Property 2 means that we are at liberty to choose the approximate eigenfunctions Zi(t) 
among all signals that are well localized in time and frequency. In particular, we would like the result- 
ing approximating kernel to be convenient to work with and the approximate eigenfunctions Zi(t) 



easy to implement, as discussed in Section |II-B3 , therefore, we choose the set of approximate 



eigenfunctions to be highly structured. By Property 3, it is possible to use time- and frequency- 
shifted versions of a single well-localized prototype function g(t) as eigenfunctions. Furthermore, 
because the support of S-a(v, r) is strictly limited in Doppler v and delay r, it follows from the 

5 We measure the joint time-frequency localization of a signal g(t) by the product between its effective duration and its effective 
bandwidth, defined in (64). 
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sampling theorem and the Fourier transform relation ^ that the samples L^(kT, nF), taken on a 
rectangular grid with T < 1/ (2z/ ) and F < 1/ (2r ), are sufficient to characterize Lh(£, /) exactly. 
Hence, we take as our set of approximate eigenfunctions the so-called Weyl-Heisenberg set {gk,n(t)}, 
where g k , n {t) — d{t — kT)e^ 2lvnFt are orthonormal signals. The requirement that the gk, n {t) are 
orthonormal and at the same time well localized in time and frequency implies TF > 1 [54], as a 
consequence of the Balian-Low theorem [55, Ch. 8]. Large values of the product TF allow for better 
time-frequency localization of g{t), but result in a loss of dimensions in signal space compared 
with the critically sampled case TF = 1. The Nyquist condition T < l/(2i/ ) and F < l/(2r ) 
can be readily satisfied for all underspread channels. 

The samples Lh(/cT, nF) are approximate eigenvalues of EI by Property 4; hence, our choice 
of approximate eigenfunctions results in the following approximating eigenvalue decomposition 
for k m (t,t') 

oo oo 

MM')~MM')= Yl E Lm{kT,nF)g k>n {t)g* kin {t') (9) 

k=—oo n=—oo 

where fcg(t, t') denotes the kernel of the approximating operator EL For TF > 1, the Weyl- 
Heisenberg set {gk,n(t)} is not complete in C 2 [54, Th. 8.3.1]. Therefore, the null space of EI 
is nonempty. As ks,(t, t') is only an approximation of k^(t,tf), this null space might differ from A/e. 
Similarly, the range space of EI might differ from TZu- The characterization of the difference between 
these spaces is an important open problem. 

2) Canonical characterization of signaling schemes: The approximating random channel opera- 
tor EI has a highly structured set of deterministic orthonormal eigenfunctions. We can, therefore, 
diagonalize the input-output relation of the approximating channel without the need for channel 
knowledge at both transmitter and receiver. Any input signal x(t) that lies in the input space of 
the approximating operator is uniquely characterized by its projections onto the set {gk,n{t)}- All 
physically realizable transmit signals are effectively band limited. As the prototype function g(t) is 
well concentrated in frequency by construction, we can model the effective band limitation of x(t) 
by using only a finite number of slots N in frequency. The resulting transmitted signal 

oo N-l 

X(t) = Y ( X > 9k,n) gk,n{t) (10) 

k=— oo n=0 ^ y. , V 

=x[k,n\ 

then has effective bandwidth W = NF. We call the coefficient x[k, n] the transmit symbol in the 
time-frequency slot (k, n). The received signal can be expanded in the same basis. To compute the 



April 10, 2008 



DRAFT 



12 



resulting projections, we substitute k^(t, t') and the canonical input signal ( 10) into the integral 
input-output relation ([I]), add white Gaussian noise w(t), and project the resulting noisy received 
signal y(t) = (Wx)(t) + w(t) onto the functions {g k ,n{t)}, i.e., 



y[k,n] = (y,gk,n) 



lx,g k>n ) + (w,g ktn ) 

w[k,n] 



} u x[k', ri] (E.g k ',n',gk,n) + w[k. 

k',n' 

L a (kT, nF) x[k, n] + w[k, n] 

s «, ' 

h[k,n] 



n 



(ID 



for all time-frequency slots (k, n). The last step in ( fTTj ) follows from the orthonormality of the 
set {gk,n(t)}- Orthonormality also implies that the discretized noise signal w[k,n] is JPG, indepen- 
dent and identically distributed (i.i.d.) over time k and frequency n; for convenience, we normalize 
the noise variance so that w[k,n] ~ CJV(0, 1) for all k and n. The diagonalized input-output 



relation ( 1 1 ) is completely generic, i.e., it is not limited to a specific signaling scheme. 



3 ) OFDM interpretation of the approximating channel model: The canonical signaling scheme ( 10 1 
and the corresponding discretized input-output relation ( fTTj ), are not just tools to analyze channel 
capacity, but also lead to a practical transmission system. The decomposition of the channel input 



signal ( 10 1 can be interpreted as pulse-shaped (PS) OFDM [56], where discrete data symbols x[k, n] 
are modulated onto a set of orthogonal signals, indexed by k and n. In addition, this perspective leads 
to an operational interpretation of the error incurred when approximating kg.it, t') as in g. The 
time- and frequency-dispersive nature of LTV channels leads to intersymbol interference (ISI) and 
intercarrier interference (ICI) in the received PS-OFDM signal. This is apparent if we project r{t) 
onto the function g^ n [t): 

oo N-l 

(r,9k,n) = (^,9k,n) = ^2 ^xWi^C^-ak^n'igk^) 



-oo n'=0 



oo JV-1 



{^■9k,n,gk,n)x[k,n]+ ^2 ^ x[k',n'} (Mg k > 

,n' j 9k,n )• (12) 



k'=—oo n'=0 
(k',n')^(k,n) 



The second term on the right-hand side (RHS) of ( 12) corresponds to ISI and ICI, while the first 
term is the desired signal; we can approximate the first term as L^(kT, nF)x[k, n] by Property 4. 
Comparison of ( 1 1 ) and (12) then shows that the input-output relation ( fTT| ), which results from the 
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approximation can be interpreted as PS-OFDM transmission over the original channel EI if 
all ISI and ICI terms are neglected. 

With proper design of the prototype signal g(t) and choice of the grid parameters T and F, both 
ISI and ICI can be reduced [56]— [58] . The larger the product TF, the more effective the reduction 
in ISI and ICI, as discussed in Appendix |Aj Heuristically, a good compromise between loss of 
dimensions in signal space and reduction of the interference terms seems to result for TF ~ 1.2 [56], 
[58]. The cyclic prefix (CP) in a conventional CP-OFDM system incurs a similar dimension loss. 



In ( |72| ), we provide an upper bound on mean-square energy of the interference term in ( 12), and 
show how this upper bound can be minimized by a careful choice of the signal g(t) and of the 
grid parameters T and F [20], [17], [58]. For general scattering functions, the optimization of the 
triple {g(t),T, F} needs to be performed numerically; a general guideline is to choose T and F 
such that (see Appendix [A]) 

I = -■ ^ 
F v 

To summarize, in this section we constructed an approximation EI of the random linear operator EI 
on the basis of the underspread property. The kernel of the approximating operator is synthesized 
from the Weyl-Heisenberg set {gk,n(t)} as in ([9]), so that {gk,n(t)} is an orthonormal basis for the 



input space and the range space of EI. The decomposition of the input signal ( |T0| ) can be interpreted 
as PS-OFDM: this interpretation sheds light on one of the errors resulting from the approximation (|9]). 
Finally, an important open problem is the characterization of the difference between the input spaces 
of EI and EI, and between the range spaces of EI and EI. 

C. Linear Time -Invariant and Linear Frequency -Invariant Channels 



The properties of LTV underspread channels we listed in Section II-B are similar to the properties 
of LTI and linear frequency-invariant (LFI) channels: both LTI and LFI channel operators are normal 
and have a well- structured set of deterministic eigenfunctions (sinusoids parametrized by frequency 
for LTI channels, and Dirac functions parametrized by time for LFI channels), with corresponding 
eigenvalues equal to the samples of a channel system function (e.g., the transfer function in the 
LTI case). Intuitively, LTI and LFI channels are limiting cases within the class of LTV channels 
analyzed in this section; in fact, an LTV channel reduces to an LTI channel when z/ = 0, and to 
an LFI channel when t = 0. Both LTI and LFI channels are then underspread, according to our 
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definition. Yet, since LTI and LFI channel operators are not of Hilbert-Schmidt type [59, App. A], 



the kernel diagonalization presented in Section II-B does not apply to these two classes of channels; 



consequently, the capacity bounds we derive in Sections [Til] and [TV] do not reduce to capacity bounds 
for the LTI or the LFI case when u = or r = 0, respectively^] 

Quasi-LTI channels, i.e., channels that are slowly time varying (u small but positive), and quasi- 
LFI channels, i.e., channels that are slowly frequency varying (r small but positive), can instead 



be approximately diagonalized as described in Section II-B as long as they are underspread. 



D. Discrete-Time Discrete-Frequency Input-Output Relation 

The discrete-time discrete-frequency channel coefficients {h[k, n]} constitute a two-dimensional 
discrete-parameter stationary random process that is JPG with zero mean and correlation function 

Ra[k,n] = E[h[k' + k,n' + n)h*[k',n')} =E[L M ((k' + k)T,(ri + n)F)L* R (k'T,n'F)] . (14) 

The two-dimensional power spectral density of {h[k, n}} is defined as 

oo oo 

c(9,<p)=Yl E MkAe~ jMk6 ~ n *\ |0|,M<l/2- (15) 

k=— oo n=— oo 

We shall often need the following expression for c(6, (p) in terms of the scattering function Cu(v, r): 

oo oo 

(a) 



c(9,<p)® E E e~^ k9 - n ^ C m (v,T)e j2 ^ kT »- nFr UTdv 

■oon=-oo VT 

oo oo 



fc=— oo n=— oo 

OO / „ , \ oo 



(16) 



k=— oo ' n=— oo 



— oo 

1 v ^ \ ^ ( 9 — k ip — n 



TF ^ V T 

k=—oo n=—oo 



6t 



For deterministic LTI channels, a channel discretization that is useful for information-theoretic analysis is discussed in [13, Sec. 
8.5]. 
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where (a) follows from the Fourier transform relation ([6]), and (b) results from Poisson's summation 
formula. The variance of each channel coefficient is given by 

1/2 1/2 

= J J c(6,(p)d9dcp 

-1/2 -1/2 

= ^ E £ / Jc„(^,^)^ 

fe- oon- oo_ 1 / 2 _ 1 /2 
1/2 1/2 

<"•- ( f . J 



-1/2 -1/2 



— / y Cu(v,T)drdv 

V T 

where (a) follows from ( fT6] ), and (b) results because we chose the grid parameters to satisfy the 
Nyquist conditions T < l/(2z/ ) and F < l/(2r ), so that periodic repetitions of the compactly 
supported scattering function lie outside the integration region. Finally, (c) follows from the change 
of variables v = 9/T and r = p/F. For ease of notation, we normalize = 1 throughout the 
paper. 

For each time slot k, we arrange the discretized input signal x[k,n], the discretized output 
signal y[k,n], the channel coefficients h[k,n], and the noise samples w[k,n] in corresponding 
vectors. For example, the iV-dimensional vector that contains the input symbols in the A;th time slot 
is defined as 

1 T 



X[fc] 



x[k,0) x[k,l] ■■■ x[k,N-l] 



The output vector y[A;], the channel vector h[k], and the noise vector w[k] are defined analogously. 



This notation allows us to rewrite the input-output relation ( fTT] ) as 

y[fc]=h[fc]0x[fc]+w[fc] (18) 
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for all k. In this formulation, the channel is a multivariate stationary process {h[&;]} with matrix 
valued correlation function 

R a [k,0] R^[k,l] ... RUk,N-l] 

Ra[k,l] Rm[k,0] ... R^[k,N-2) 



R h [k] =K[h[k' + k]h H [k 



(19) 



_R m [k,N-l] R m [k,N-2] ... R m [k,0] 
In most of the following analyses, we initially consider a finite number K of time slots and then 
take the limit K — ► oo. To obtain a compact notation, we stack K contiguous elements of the 
multivariate input, channel, and output processes just defined. For the channel input, this results in 
the K ^-dimensional vector 

1 T 



x T [0] x T [l] ■■■ x T [K-l] 



(20) 



Again, the stacked vectors y, h, and w are defined analogously. With these definitions, we can now 



compactly express the input-output relation ( |lTj ) as 

y = x h + w. 



(21) 



We denote the correlation matrix of the stacked channel vector h by Rh = E [hh ff ] . Because 
the channel process {h[k, n)} is stationary in time and in frequency, Rh is a two-level Hermitian 
Toeplitz matrix, given by 

R h [0] R£[l] ... B*[K-1] 
R h [l] R h [0] ... K[K-2] 



Rh 



R h [K-l] R h [K-2] 



Rh[0] 



(22) 



E. Power Constraints 



Throughout the paper, we assume that the average power of the transmitted signal is constrained 
as (1/T) E[||x|| 2 ] < KP. In addition, we limit the peak power to be no larger than (3 times the 
average power, where (3 > 1 is the nominal peak- to average-power ratio (PAPR). 

The multivariate input-output relation pT) allows to constrain the peak power in several different 
ways. We analyze the following two cases: 
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1) Peak constraint in time: The power of the transmitted signal in each time slot k is limited as 

1 N ~ l 

-^|x[A;,n]| 2 </3P w.p.l. (23) 

n=0 

This constraint models the fact that physically realizable power amplifiers can only provide 
limited output power [4] . 

2) Peak constraint in time and frequency: Regulatory bodies sometimes limit the peak power in 
certain frequency bands, e.g., for UWB systems. We model this type of constraint by imposing 
a limit on the squared amplitude of the transmitted symbols x[k, n] in each time-frequency 
slot (k,n) according to 

(1/T) \x[k,n]\ 2 < PP/N w.p.l. (24) 



This type of constraint is more stringent than the peak constraint in time given in (23 ). 
Both peak constraints above are imposed on the input symbols x[k, n], i.e., in the eigenspace of 
the approximating channel operator. This limitation is mathematically convenient; however, the 



peak value of the corresponding transmitted continuous-time signal x{t) in ( 10 1 also depends on the 



prototype signal g{t), so that a limit on x[k, n] does not generally imply that x(t) is peak limited. 
III. Capacity Bounds under a Peak Constraint in Time and Frequency 



In the present section, we analyze the capacity of the discretized channel in ( 1 1 ) subject to the peak 
constraint in time and frequency specified by ( |24| ). The link between the discretized channel ( [TTj ) and 
the continuous-time channel model established in Section |n] then allows us to express the resulting 
bounds in terms of the scattering function Cm{v, t) of the underspread WSSUS channel EL 



As we assumed that the channel process {h[k, n] } has a spectral density [given in ([16])], the vector 
process {h[A;]} is ergodic [60] and the capacity of the discretized underspread channel pTj ) is given 
by [61, Ch. 12] 

C{W) = Urn sup J(y; x) [nat/s] (25) 

K^OO K 1 Q 

for a given bandwidth W = NF. Here, the supremum is taken over the set Q of all input distributions 
that satisfy the peak constraint (24) and the average-power constraint E[||x|| 2 ] < KPT. 



The capacity of fading channels with finite bandwidth has so far resisted all attempts at closed-form 
solutions [62], [22], [63], even for the memory less case; thus, we resort to bounds to characterize 



the capacity (25 ). In particular, we present the following bounds: 
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• An upper bound U C (W), which we refer to as coherent upper bound, that is based on the 
assumption that the receiver has perfect knowledge of the channel realizations. This bound is 
standard; it turns out to be useful for small bandwidth. 

• An upper bound Ui(W / ) that is useful for medium to large bandwidth. This bound is explicit in 
the channel's scattering function and extends the upper bound [28, Prop. 2.2] on the capacity 
of frequency-flat time-selective channels to general underspread channels that are selective in 
time and frequency. 

• A lower bound Li(W) that extends the lower bound [27, Prop. 2.2] to general underspread 
channels that are selective in time and frequency. This bound is explicit in the channel's 
scattering function only for large bandwidth. 

A. Coherent Upper Bound 

The assumption that the receiver perfectly knows the instantaneous channel realizations furnishes 
the following capacity upper bound: 

sup J(y; x | h) 



KT 



Q r^i q 



(a) 


1 


< 






KT 


(b) 


1 


< 






KT 


(£_) 


1 




KT 


(d) 


N m 


< 






T 



sup J(y; x | h) 

E[\\ X p]<KPT ^ 

supE h [logdet(l*jv + (hh^)0R x )] 



(PT 

Here, (a) holds because the coherent mutual information, J(y; x | h), is an upper bound on the 
corresponding mutual information in the noncoherent setting. Inequality (b) follows as we drop the 
peak constraint and thus enlarge the set of admissible input distributions. The supremum of J(y; x | h) 
over the resulting relaxed input constraint is achieved by a zero-mean JPG input vector x with 
covariance matrix R x = Ejxx^] that satisfies tr(R x ) < KPT [3]. To obtain (c), we use that, 
conditioned on h, the output vector y is JPG and its covariance matrix can be expressed as 

E[yy"|h] =I M + E x [(x0h)(x0h) 1 '] = 1 KN + (hh H ) R x 

where the last equality results from the following elementary relation between Hadamard products 
and outer products: 

(x0h)(x0hf = xx if 0hh if . (27) 
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Finally, (d) follows from Hadamard's inequality, from the fact that by Jensen's inequality the 
supremum is achieved by R x = (PT/N)Ikn, an d because the channel coefficients all have the 



same distribution h[k, n) ~ h ~ CJ\f(0, 1). As the upper bound (26) does not depend on K, we 



obtain an upper bound U C (W) on capacity (25 1 as a function of bandwidth W if we set W = NF: 

(28) 



C(W)<XJ c (W) = ^E h 



For a discretization of the WSSUS channel EI different from the one in Section \ll-B\ Medard 
and Gallager [8] showed that the corresponding capacity vanishes with increasing bandwidth if 
the peakiness of the input signal is constrained in a way that includes our peak constraint ( |24| ). 
As the upper bound U C (W / ) monotonically increases in W, it is sensible to conclude that U C (H / ) 
does not accurately reflect the capacity behavior for large bandwidth. However, we demonstrate 



in Section III-D by means of a numerical example that U C (W) can be quite useful for small and 



medium bandwidth. 

B. An Upper Bound for Large but Finite Bandwidth 

To better understand the capacity behavior at large bandwidth, we derive an upper bound Ui(VK) 
that captures the effect of diminishing capacity in the large-bandwidth regime. The upper bound 
Ui(W / ) is explicit in the channel's scattering function Cm(v, t). 

1 ) The upper bound: 

Theorem 1: Consider an underspread Rayleigh-fading channel with scattering function Cu{v, t); 
assume that the channel input x satisfies the average-power constraint E[||x|| 2 ] < KPT and 
the peak constraint |s[/c,n]| 2 < /3PT/N w.p.l. The capacity of this channel is upper-bounded 

asC(W0 < Ui(W0, where 

Ui(W) = ^\ g(l + a{W)P^P\ -a(W)A(W) (29a) 

with 



TF V W 



a(W) = min<J 1, ^- [ - ^ ) \ (29b) 



and 



TF \A(W) P 



A(W) = jjj ]og(l + ^rC m (u,r) ) drdv. (29c) 
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Proof: To bound sup Q J(y; x), we first use the chain rule for mutual information, I(y; x) = 
i"(y; x, h) — /(y; h | x). Next, we split the supremum over Q into two parts, similarly as in the 
proof of [28, Prop. 2.2]: one supremum over a restricted set of input distributions Q\ a that satisfy 
the peak constraint ( [24] ) and have a prescribed average power, i.e., E[||x|| 2 ] = aKPT for some 
fixed parameter a E [0, 1], and another supremum over the parameter a. Both steps together yield 
the upper bound 

sup J(y; x) = sup{/(y; x, h) - I(y- h | x)} 



sup sup{J(y;x,h) - J(y;h|x)} 



(30) 



0<o<l Q\ c 

< sup < sup /(y; x, h) — inf /(y; h | x) 

0<«<1 I Q\ a " Q\a 

Next, we bound the two terms inside the braces individually. While standard steps suffice for the 
bound on the first term, the second term requires some more effort; we relegate some of the more 
technical steps to Appendix [B] 

a) Upper bound on the first term: The output vector y depends on the input vector x only 
through s = x h, so that /(y; x, h) = /(y; s). To upper-bound the mutual information /(y; s), 
we take s as JPG with zero mean and covariance matrix E [ss H ] = E [xx ff ] Rh- An upper bound 



on the first term inside the braces in (30) now results if we drop the peak constraint on s. Then, 



sup I(y; x, h) < sup log det (Irn + E [xx H ] © R h ) 

G|a ' El\\x\\2]=aKPT 



, , K-lN-l 
(a) 



< sup V Vlog(l + E[|x[A;,n]| 2 ]) (3D 

E[\\ X \\*]=aKPT k=0 n=Q 

( fc ) / aPT\ 

<KN\og^l + —j 

where (a) follows from Hadamard's inequality and (b) from Jensen's inequality. 

b) Lower bound on the second term: We use the fact that the channel h is JPG, sothat/(y; h | x) 
E x [log det (I kn + (xk h ) Rh)] • Next, we expand the expectation operator as follows: 

inf J(y; h | x) = inf E x [log det (l KN + (xx H ) R h )] 

Q|a Q\a 

= mf r/logdet^ + ^lOlW)^ , < 32 > 
QeQ\ a J \ ||x|| 2 / 
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where X = {x G C KN : \x[k, n] | 2 < (3PT/N, \/k, n} is the integration domain because the input 
distribution Q satisfies the peak constraint ( |24] ). Both factors under the integral are nonnegative; 
hence, we obtain a lower bound on the expectation if we replace the first factor by its infimum 
over X. 



inf I(y: h x) > inf 
Q\ a QeQ\ a J \ x-.v 



log det (Irn + (xx fl ) Rh) 



I ~ i|2 



)dQ 



. logdet(l^7v + (xx^)0R h ) . 
mi - — — mi 



aKPT inf 

xl£X 



QeQ\ c 



M\ 2 dQ 



(33) 



infou E\\\x\\ 2 ]=aKPT 



logdet(l K jv + (xx H )0R h ) 



As the matrix R h is positive semidefinite, the above infimum is achieved on the boundary of the 
admissible set [26, Sec. VI. A], i.e., by a vector x whose entries satisfy \x[k, n] | 2 G {0, /3PT/N}. 
We use this fact and the relation between mutual information and MMSE, recently discovered 
by Guo et al. [35], to further lower-bound the infimum on the RHS in p3| ). The corresponding 
derivation is detailed in Appendix [Bj it results in 

1/2 1/2 



. log det (I KN + (xx 1 ) 0R h ) N 
ml n — ^ > 



(3PT 



log 1 



>Pr z(9,ip) J d6d<p (34) 



N 



-1/2 -1/2 



where c(6, tp), defined in ( 15 ), is the two-dimensional power spectral density of the channel pro- 



cess {h[k, n}}. Finally, we use the bound (34 ) in (33 1, relate c(9, tp) to the scattering function C m (v, r) 



by means of ( 16 ) and get 
inf J(y; h | x) > 



He 



nlsX 7 7\ / /3P ^ ^ „ (9-k <p-n 

log 1 1 + ^ 2^ Ci 



1/2 -1/2 
1/2 1/2 



aKN 

-1/2 -1/2 

aKNTF 



NF 



k=—oa n=—oo 



T ' F 



dOdip 



l0gU + iVF a VT 



dOdip 



(35) 







log( 1 + j^C m (v,t) j (/7(/// 



where the last two equalities result from steps similar to the ones used in ( 17 ) 
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c) Completing the proof: We insert pT) and ([35]) in ([30]), divide by i*TT, and set W = ATF to 



obtain the following upper bound on capacity ( |25| ) 



C(W) < sup 

0<o<l 



log H 

TF 6 V VP J 







drdv 



(36) 



As the function to maximize in (36) is concave in a, the maximizing value is unique. To conclude 



the proof and obtain the bound (29), we perform an elementary optimization over a to find the 



maximizing a(W) given in (29b). ■ 
The upper bound in Theorem [T] generalizes the upper bound [29, Eq. (2)], which holds only for 
constant modulus signals, i.e., for signals whose magnitude \x[k, n] | is the same for all k and n. The 



bounds (29a) and [29, Eq. (2)] are both explicit in the channel's scattering function, have similar 



structure, and coincide for (3 — 1 when a(W) = 1 in (29b). 



2) Conditions for a(W) = 1: If a(W) = 1 independently of W, the first term of the upper 



bound Ui(W / ) in (29a) can be interpreted as the capacity of an effective AWGN channel with 
receive power P and W/ (TF) degrees of freedom, while the second term can be seen as a penalty 
term that characterizes the capacity loss because of channel uncertainty. We highlight the relation 
between this penalty term and the error in predicting the channel from its noisy past and future in 
Appendix |B} For a(W) < 1, the upper bound (]29a]) has a more complicated structure, which is 
difficult to interpret. We show in Appendix [c] that a sufficient condition for a(W) = 1 i^] 

A e < /3/(3TF) (37a) 



and 



P A e 
< — < — 
~ W (3 



cxp 



p 



(37b) 



2TFA E 

As virtually all wireless channels are highly underspread, as (3 > 1, and as, typically, TF w 1.25, 



condition ( |37a[ ) is satisfied in all cases of practical interest, so that the only relevant condition 
is ( |37b[ ); but even for large channel spread Ae, this condition holds for all SNR values^jP/W of 
practical interest. As an example, consider a system with (3 = 1 and spread Ae = 1CT 2 ; for this 



choice, (37b) is satisfied for all SNR values less than 153 dB. As this value is far in excess of the 



7 More precisely, in Appendix jc| we derive a sufficient condition for a(W) = 1 that implies \37\ . 
8 Recall that we normalized No = 1. 
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receive SNR encountered in practical systems, we can safely claim that a capacity upper bound of 



practical interest results if we substitute a(W) = 1 in (29a). 

3) Impact of channel characteristics: The spread Ah and the shape of the scattering func- 
tion Ch(z/, r) are important characteristics of wireless channels. As the upper bound ( [29] ) is explicit 
in the scattering function, we can analyze its behavior as a function of Ah and C^iy, r). We restrict 
our discussion to the practically relevant case a(W) = 1. 

a) Channel spread: For fixed shape of the scattering function, the upper bound Ui(W / ) 
decreases for increasing spread Ah- To see this, we define a normalized scattering function Ch(v, t) 
with unit spread^] so that Ch(^,t) = Cm{v/{2vo), t/(2t ))/Ah. By a change of variables, the 
penalty term can now be written as 

A(W) = jJJ]og(l+ ¥Lca(u,T)\ drdv 



V T 



1/2 1/2 (38) 



WA R f f PP ~ 

log 1 + „ rA Cu(v,t) drdv. 



P J J V WA * 

-1/2 -1/2 

Because A e log(l + p/A H ) is monotonically increasing in A e for any positive constant p > 0, the 



penalty term A(W) increases with increasing spread Ah- As the first term in (29a) does not depend 
on Ah, the upper bound Ui(W / ) decreases with increasing spread. 

b) Shape of the scattering function: For fixed spread Ah, the scattering function that results 
in the lowest upper bound Ui(W) is the "brick- shaped" scattering function: C-giy, r) = 1/A H 
for (u,t) E [— ^o?^o] x [—To, To]- We prove this claim in two steps. First, we apply Jensen's 



inequality to the penalty term in (29c ): 



log( 1 + ^rC m (v, r)j drdv < A e log ^1 + pj^- / / C m (v, r)drdv 

( BP 
= A H log 1 + 



(39) 



, AhW 

Second, we note that a brick-shaped scattering function achieves this upper bound. 

The observation that a brick-shaped scattering function minimizes the upper bound Ui(H / ) 
sheds some light on the common practice to use u and r , rather than Cm(v, t) in the design of a 

'Recall that we normalized crjj = 1 in \17\ . 
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communication system. A design on the basis of v and r is implicitly targeted at a channel with 
brick-shaped scattering function, i.e., at the worst-case channel. 

C. Lower Bound 

1) A lower bound in terms of the multivariate spectrum of {h[k\\: To state our lower bound on 
the capacity ( [25] ), we require the following definitions. 

• Let C (9) denote the matrix-valued power spectral density of the multivariate channel pro- 
cess {h[/c]}, i.e., 

oo 1 

C(9)= R>H[k]e- j2M , \9\<-. (40) 

k=— oo 

• Let I(y; x | h) denote the coherent mutual information of a scalar, memoryless Rayleigh-fading 
channel y = hx + w with h ~ £/V(0, 1), additive noise w ~ £/V(0, 1), and zero-mean 
constant-modulus input signal, i.e., \x\ 2 = ^yPT/N w.p.l. 

Theorem 2: Consider an underspread Rayleigh-fading channel with scattering function Ch(v, r). 
Assume that the channel input x satisfies the average-power constraint E[||x|| 2 ] < KPT and 
the peak constraint |:r[fc,n]| 2 < (3PT/N w.p.l. The capacity of this channel is lower-bounded 
asC(W0 > U(W), where 

f 1/2 1 

Li(W)=m« \^(y^\h)-^ J logdet^ + ^C^j^ . (41) 

^ -1/2 ' 

Proof: We obtain a lower bound on capacity by computing the mutual information for a specific 
input distribution. A simple scheme is to send symbols that have zero mean, are i.i.d. over time 
and frequency slots and have constant magnitude, i.e., \x [k, n] T = PT/N for k = 0, 1, . . . , K - 1 
and n = 0, 1, . . . ,N — 1. The average power constraint is then satisfied with equality. We denote 
a XiV-dimensional input vector that follows this distribution by u; this vector has entries u[k, n] 
that are first stacked in frequency and then in time, analogously to the definitions of x and y in 
Section HLDl 
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We use the chain rule for mutual information and the fact that mutual information is nonnegative 
to obtain the following bound: 

J(y; u ) = J (y; u > h ) - J (y; h I u ) 

= /(y;h) + /(y;u|h)-/(y;h|u) (42) 

>/(y;u|h)-/(y;h|u). 
Next, we evaluate the two terms on the RHS of the above inequality separately. The first term 
satisfies 

I(y;u\h) = KNI(y;u\h) (43) 

where we set h = h[k,n] and u = u[k,n] for arbitrary k and n because (i) the input vector u 
has i.i.d. entries, and (ii) all channel coefficients have the same distribution. The second term equals 

J(y;h|u) = E U [log det (1^4- (uu H )©R h )] 



E u 



(a) 



E, 



logdet^I^Ar + diag(u) R h diag(u) H 
log det ( I KN + diag(u)^ diag(u) R h 



(44) 



= log det I 1 KN + -jj-R-h 

where (a) follows from the identity det (I + AB H ) = det (I + B H A) for any A and B of 
appropriate dimension [64, Th. 1.3.20], and (b) follows from the constant modulus assumption. We 
now combine the two terms ([43]) and (041), set W = NF, divide by KT, and take the limit K — ► oo 



to obtain the following lower bound: 

C(W) > lim JLj(y; u) 

K — >oo J\ 1 



W 1 / PTF 

> W I{V\ u | h) - Hm — log det ( 1^ + -^R h 



(45) 



The correlation matrix R h is two-level Toeplitz, with blocks that are N x N correlation matri- 



ces Rh[fc], as shown in (22) and (19), respectively. Hence, we can explicitly evaluate the limit 



on the RHS of (45 ) and express it in terms of an integral over the matrix- valued power spectral 
density C(6>) of the multivariate channel process {h[&]}. By direct application of [34, Th. 3.4], an 
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extension of Szego's theorem (on the asymptotic eigenvalue distribution of Toeplitz matrices) to 
two-level Toeplitz matrices, we obtain 

1/2 

1 / PTF \ 1 f f PTF \ 

lim — log det I KN + -"F?^ R h J = = / log det I N + — — C(0) d9. (46) 



k->oc KT V W J T J V W 

-1/2 



The lower bound that results upon substitution of (46 1 into (45 1 can be tightened by time-sharing [27, 
Cor. 2.1]: we allow the input signal to have squared magnitude ^yPTF/W during a fraction I/7 
of the total transmission time, where 1 < 7 < (3; that is, we set x = y/^u during this time; for the 
remaining transmission time, the transmitter is silent, so that the constraint on the average power is 
satisfied. ■ 



The evaluation of Li(W) in (41 ) is complicated by two facts: (i) the mutual information I(y; x | h) 



in the first term on the RHS of (41 ) needs to be evaluated for a constant- modulus input; (ii) the 
eigenvalues of C (9) in the second term (the penalty term) can in general not be derived in closed form. 
While efficient numerical algorithms exist to evaluate the coherent mutual information I(y; x \ h) 
for constant-modulus inputs [65], numerically computing the eigenvalues of the N x N matrix C(#) 
is challenging for channels of very wide bandwidth because the matrix C (9) will be large. In the 
following lemma, we present two bounds on the second term of L^JV) that are easy to compute. 
Lemma 3: Let 

di = f^(N - n)R a [0,n]e-^f 1 - 1. (47) 

I n=0 J 



Then, the penalty term in (41 ) (for the case 7 = 1) can be bounded as follows: 

N-l 



( PF \ 



1/2 

>^ J logdet(Tv + ^C(^ M 

-1/2 



> W 1 1 log( 1 + ^C u (u, t) ) drdv. (48) 



Furthermore, the following asymptotic results hold: 



The penalty term and its lower bound in (48 ) have the same Taylor series expansion around 
the point 1/W = up to any order. 
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• For scattering functions that are fiat in the Doppler domain, i.e., that satisfip*] 

C m (u,r) = ^—pm(r), (f,r) G [-v ,Vo\ X [-t ,t ], (49) 



the upper bound and the lower bound in (48 ) have the same Taylor series expansion around 
the point 1/W = up to any order. 

Proof: See Appendix |Dj ■ 



The bounds (48 1 on the penalty term allow us to further bound L i (W / ). If we replace the penalty 



term in (41 ) with its upper bound in (48), we obtain the following lower bound on L^W 7 ) and, 



hence, on capacity 

f W r , | . 2z/ ^ 1 1 / 7 PF 



U(W) > L 2 (W) = j J^Ifo , | ft) - 55> g log(l + J. (50) 



The lower bound L 2 (W) can be evaluated numerically in a much more efficient way than Li(W) 
because the coefficients {di} can be computed from the samples {(iV — n)Ru[0, n)} through the 
discrete Fourier Transform (DFT). If, instead, we replace the penalty term in (|4l"j) with its lower 



bound in (48 ) we obtain 



U{W) < K(W) = maxj^xl h)-^JJ \og(l + ^rC u (v,r)^ drdv ). (51) 



l<7</3 



Furthermore, for large bandwidth we can replace the coherent mutual information I(y; x \ h) in (51 1 
with its second-order Taylor series expansion [14, Th. 14] to obtain the approximation 

K(W) « L aa (W0 = max J P - - ^ J J log(l + ^Oifo r)) drdv \. (52) 



!/ r 



It follows from Lemma|3]that Li (W) and L a (W) have the same Taylor series expansion around 1 /W = 
up to any order, so that L 1 (W / ) ps L a (W) ~ L aa (W / ) for large enough W. Furthermore, for 



scattering functions that satisfy (49) (e.g., a brick-shaped scattering function), also L 1 (W / ) and 



Li2(VK) have the same Taylor series expansion around 1 jW = up to any order. Hence, L 2 (W / ) 



Li(W) « L a (W) for large enough W, for scattering functions that satisfy (49) 



D. Numerical Example 

We next evaluate the bounds found in the previous section for the following set of practically 
relevant system parameters: 

l0 The multiplication by l/(2fo) m (49) follows from the normalization cr^ = 1. 
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-I—' 



-i— ' 

CO 




bandwidth [GHz] 



Fig. 1. The upper bounds U C (W) in l |28| and Ui(VV) in \29\ , as well as the lower bound I^W 7 ) in \50\ , and the large-bandwidth 
approximations of Li(W) in 1 51 1 and 1 52 1 for (3 — 1 and a brick-shaped scattering function with spread Ae = 10 -5 . 



Brick-shaped scattering function with maximum delay t = 0.5 us, maximum Doppler shift u = 
5 Hz, and corresponding spread A K = 4t i- / o = 1CT 5 . 

Grid parameters T = 0.35 ms and F = 3.53 kHz, so that TF w 1.25 and T/F = t /u , as 



suggested by the design rule ( |T3| ). 

Receive power normalized with respect to the noise spectral density 

P 7 1 

2.42 ■ 10 7 sec _1 . 



lW/Hz 

These parameter values are representative for several different types of systems. For example: 
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(a) An IEEE 802.11a system with transmit power of 200 mW, pathloss of 118 dB, and receiver 
noise figure [66] of 5 dB; the pathloss is rather pessimistic for typical indoor link distances and 
includes the attenuation of the signal, e.g., by a concrete wall. 

(b) A UWB system with transmit power of 0.5 mW, pathloss of 77 dB, and receiver noise figure 
of 20 dB. 

Fig.[T]shows the upper bounds U C (VF) in (28 1 and Ui(W) in (29), as well as the lower bound L^fW) 



in ( p0| ), and the large-bandwidth approximations L a (W) in (51 ) and L aa (W) in (52), all for (3=1. 
As brick-shaped scattering functions are flat in the Doppler domain, i.e., they satisfy the condition 



in (49), it follows from Lemmaphhat the difference between L a (W) and the lower bound L 2 (W) 



in (50 1 vanishes as W — ► oo. For our choice of parameters, this difference is so small even for finite 
bandwidth that the curves for L a (W) and the lower bound L 2 (W) cannot be distinguished in Fig. [T] 
As L 2 (W) < Li(W) < L a (W), the lower bound Li(W) is fully characterized as well. 

The upper bound Ui (W) and the lower bound Li (W) take on their maximum at a large but finite 
bandwidth; beyond this critical bandwidth, additional bandwidth is detrimental and the capacity 
approaches zero as bandwidth increases further. In particular, we can see from Fig. [Tjthat many 
current wireless systems operate well below the critical bandwidth. It can furthermore be verified 
numerically that the critical bandwidth increases with decreasing spread, consistent with our analysis 



in Section III-B3 We also observed that the gap between upper and lower bounds increases with 
increasing (3. 

For bandwidths smaller than the critical bandwidth, Li(W) comes quite close to the coherent 
upper bound XJ C (W); this seems to validate, at least for the setting considered, the standard receiver 
design principle to first estimate the channel, and then use the resulting estimates as if they were 
perfect. 



The approximate lower bound L aa (W) in (52) is accurate for bandwidths above the critical 
bandwidth and very loose otherwise. Furthermore, Ui(W) and L aa (VF) seem to fully character- 
ize C(W) in the large-bandwidth regime. We will make this statement precise in the next section, 
where we relate Ui(VK) and Li(W) to the first-order Taylor series expansion of C(W) around the 
point l/W = 0. 
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E. Capacity in the Infinite -Bandwidth Limit 

The plots in Fig. [T] of the upper bound Ui(W) and the lower bound Li(W) seem to coincide 
for large bandwidth, yet it is not clear a priori if the two bounds allow to characterize capacity in 
the limit for W — > oo. To address this question, we next investigate if both bounds have the same 
first-order Taylor series expansion in 1/W around the point 1/W = 0. 



Because the upper bound Ui(W) in (29 1 takes on two different forms, depending on the value of 



the parameter a(W) in (29b]), its first-order Taylor series is somewhat tedious to derive. We state 



the result in the following lemma and provide the derivation in Appendix |E} 
Lemma 4: Let 



Kw= C^(is,T)drdv. (53) 



Then, the upper bound (29) in Theorem [T] admits the following first-order Taylor series expansion 



U l (W) = £= + o(^=] (54a) 



around the point 1/W = 

TJi(W) = 

w \w 

where 

{flKa-TF), ifp> 



c= lim WV 1 (W)={ 2 n KM (54b) 

8TF ' k m ■ 

We show in Appendix [Fjthat the corresponding Taylor series expansion of the lower bound Lx(PF) 



in (41 ) does not have the same first-order term c. This result is formalized in the following lemma. 

Lemma 5: The lower bound ( |4Tj ) in Theorem [2] admits the following first-order Taylor series 
expansion around the point 1/W = 0: 

U(W) = -=- + of-J- ) (55a) 



W \W 

where 



c = lim WUCW) =(3P 2 (^- Tf) . (55b) 
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As c in (54b) and c in (55b) are different, the two bounds Ui(VK) and L 1 (W / ) do not fully 
characterize C(W) in the wideband limit. In the next theorem, we show, however, that the first- 
order Taylor series of Ui(W) in Lemma [4] indeed correctly characterizes C(W) for W — > oo. 

Theorem 6: Consider an underspread Rayleigh-fading channel with scattering function C^(u, r). 
Assume that the channel input x satisfies the average-power constraint E[||x|| 2 ] < KPT and the 
peak constraint |x[fc,n]| 2 < (3PT/N w.p.l. The capacity C(W) of this channel has a first-order 
Taylor series expansion around the point 1/W = equal to the first-order Taylor series expansion 
in(|54j). 

Proof: We need a capacity lower bound different from Li(VK) with the same asymptotic 
behavior for W — > oo as the upper bound Ui(W). The key element in the derivation of this 
new lower bound is an extension of the block-constant signaling scheme used in [28] to prove 
asymptotic capacity results for frequency-flat time- selective channels. In particular, we use input 
signals with uniformly distributed phase whose magnitude is toggled on and off at random with a 
prescribed probability; hence, information is encoded jointly in the amplitude and in the phase. In 
comparison, the signaling scheme used to obtain Li(W) transmits a signal of constant amplitude in 
all time-frequency slots. We present the details of the proof in Appendix [Gj ■ 
Similar to the capacity behavior of a discrete-time frequency-flat time- selective channel for 



vanishing SNR [28], the first-order Taylor series coefficient in (54b ) can take on two different forms 
as a function of the channel parameters. However, the link in ( [T6] ) between the discretized channel and 
the WSSUS channel H allows us to conclude that (3 > 2TF/n m and thus c = P 2 ((3k m - TF) /2 for 
virtually all channels of practical interest. In fact, by Jensen's inequality, kh > A^ 1 (with equality 
for brick-shaped scattering functions), so that 2TFA H > 2TF/k^, and a sufficient condition 
for (3 > 2TF/k m is (3 > 2TFA m . For typical values of TF (e.g., TF w 1.25) and typical values 
of A H (e.g., Ae < 1CT 2 ), this latter condition is satisfied for any admissible (3. 

We state in Lemma [5] that the first-order term c in the Taylor series expansion of the lower 
bound Li(W) does not match the corresponding term c of the Taylor series expansion of capacity, 
not even for realistic channel parameters as just discussed. Yet, the plots of the upper bound Ui(VK) 
and the lower bound Li(W) in Fig. [T] seem to coincide at large bandwidth. This observation is not 
surprising as the ratio 

. /3(k h /2 - TF) 
c/c — 



(1/2)(k m (3-TF) 
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approaches 1 for (3 and TF fixed as grows large. For example, we have c/c = 0.998 for the 
same parameters we used for the numerical evaluation in Section III-D , i.e., A H = 10~ 3 , (3 = 1, 
andTF = 1.25. 

IV. Infinite-Bandwidth Capacity under a Peak Constraint in Time 
So far we considered a peak constraint in time and frequency; we now analyze the case when 



the input signal is subject to a peak constraint in time only, according to (23 ). The average-power 
constraint E[||x|| 2 ] < KPT remains in force. In addition, we focus on the infinite-bandwidth limit. 
By means of a capacity lower bound that is explicit in the channel's scattering function, we show 
that the phenomenon of vanishing capacity in the wideband limit can be eliminated if we allow the 
transmit signal to be peaky in frequency. Furthermore, using the same approach as in the proof of 
Theorem [TJ we obtain an upper bound on the infinite-bandwidth capacity that, for F = l/(2r ), 
differs from the corresponding lower bound only by a Jensen penalty term. The two bounds coincide 
for brick-shaped scattering functions when F = l/(2ro). 



The infinite-bandwidth capacity of the channel ( 1 1 ) is defined as 



Coo = lim lim sup— =I(y;x), (56) 

N—*oo K-^oo g J\ 1 

where the supremum is taken over the set S of all input distributions that satisfy the peak con- 
straint (23) and the constraint E[||x|| 2 ] < KPT on the average power. 



A. Lower Bound 



We obtain a lower bound on by evaluating the mutual information in ( |56| ) for a specific 
signaling scheme. As signaling scheme, we consider a generalization in the channel's eigenspace 
of the on-off FSK scheme proposed in [67]. The resulting lower bound is given in the following 
theorem. 

Theorem 7: Consider an underspread Rayleigh-fading channel with scattering function C-aiy, t); 
assume that the channel input x satisfies the average-power constraint E[||x|| 2 ] < KPT and the 
peak constraint J2n=o \ x [^i n ]\ 2 — (3PT w.p.l. The infinite-bandwidth capacity of this channel is 



lower-bounded as > L^, where 

P - 



L OQ = P-] 5 I log(l + (3Pq m {v)) dv (57) 
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and gm(i/) — / r)dr denotes the power-Doppler profile of the channel. 

Proof: See Appendix |H| ■ 
For (5 — 1, the lower bound in ( [57] ) coincides with Viterbi's result on the rates achievable on 
an AWGN channel with complex Gaussian input signals with spectral density ga(u), modulated 
by FSK tones [23, Eq. (39)]. Viterbi's setup is relevant for our analysis, because, for a WSSUS 
channel with power-Doppler profile qgiy), the output signal that corresponds to an FSK tone can 
be well-approximated by Viterbi's transmit signal whenever the observation interval at the receiver 
is large and the maximum delay r of the channel is much smaller than the observation interval [13, 
Sec. 8.6]. The proof technique used to obtain Theorem[7jis, however, conceptually different from 
that in [23]. On the basis of the interpretation of Viterbi's signaling scheme provided above, we can 
summarize the proof technique in [23] as follows: first, a signaling scheme is chosen, namely FSK, 
for transmission over a WSSUS channel; then, the resulting stochastic process at the channel output is 
discretized by means of a Karhunen-Loeve decomposition; finally, the result on the achievable rates 
in [23, Eq. (39)] follows from an error exponent analysis of the discretized stochastic process and 
from [13, Lemma 8.5.3] — Szego's theorem on the asymptotic eigenvalue distribution of self-adjoint 
Toeplitz operators. 

To prove Theorem [7J on the other hand, we first discretize the WSSUS underspread channel; 
the rate achievable for a specific signaling scheme, which resembles FSK, yields then the infinite- 
bandwidth capacity lower bound ([57]). The main tool used in the proof of Theorem [TJis a property 



of the information divergence of FSK constellations, first presented by Butman & Klass [36]. 



For (3 — > oo, i.e., when the input signal is subject only to an average-power constraint, in (57 ) 
approaches the infinite-bandwidth capacity of an AWGN channel with the same receive power, as 
previously demonstrated by Gallager [13]. The signaling scheme used in the proof of Theorem |7j is, 
however, not the only scheme that approaches this limit when no peak constraints are imposed on the 
input signal. In [15] we presented another signaling scheme, namely, TF pulse position modulation, 
which exhibits the same behavior. The proof of [15, Th. 1] is similar to the proof of Theorem |7J in 
Appendix [Hj 

B. Upper Bound 

In Theorem[8]below we present an upper bound on and identify a class of scattering functions 



for which this upper bound and the lower bound (57 ) coincide if F — 1/ (2r ). Differently, from 
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the lower bound, which can be obtained both by Viterbi's approach and through our approach, the 
upper bound presented below is heavily built on the discretization of the continuous-time WSSUS 



underspread channel presented in Section II-B1 



Theorem 8: Consider an underspread Rayleigh-fading channel with scattering function C-aiy, r); 
assume that the channel input x satisfies the average-power constraint E[||x|| 2 ] < KPT and the 
peak constraint J2n=o \ x [k, n}\ 2 < (3PT w.p.l. The infinite-bandwidth capacity of this channel is 
upper-bounded as < Uqo, where 

Uoo = P - ^ jj log^l + ^-C M {u, r^j drdv. (58) 

V T 

Proof: See Appendix |Jj ■ 
As the upper bound ( |58| ) is a decreasing function of F, and as F has to satisfy the Nyquist 
condition F < l/(2r ), the upper bound is minimized when F = l/(2r ). For this value of F, 
Jensen's inequality applied to the second term on the RHS of ([58]) yields: 



log(l + 2r o 0PC m (v, r)) drdv < \ I log I 1 + (3P I C m (u, r)dr I du 



V T 

1 

'ft 



(59) 



\og(l + (3 Pq M (v))dv. 



Hence, for F = l/(2r ), the upper bound (58 ) and the lower bound (57) differ only by a Jensen 



penalty term. It is interesting to observe that the Jensen penalty in (59 1 is zero whenever the scattering 



function is flat in the delay domain, i.e., whenever Ch.(u, t) is of the fornj"] 

C u (u,t) = — qa(y), (v,r) e[-u ,v ]x[-T ,T ]. (60) 

In this case, upper bound and lower bound coincide and the infinite bandwidth capacity is fully 
characterized by 

C OQ =P-^J log(l + l3Pq m {v)) du. (61) 



Expressions similar to (61 1 were found in [26] for the capacity per unit energy of a discrete-time 
frequency-flat time-selective channel, and in [24], [25] for the infinite-bandwidth capacity of the 
continuous-time counterpart of the same channel; in all cases a peak constraint is imposed on the 

"The multiplication by l/(2ro) in J60l follows from the normalization erg = 1. 
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input signals. However, the results in [24]-[26] and our results are not directly related, as discussed 
next. 

1) Comparison with [24], [25]: The continuous-time time- selective frequency-flat channel 



analyzed in [24], [25] belongs to the class of LFI channels. As explained in Section II-C the 



kernel of an LFI channel cannot be diagonalized as was done in Section II-B 1 because LFI channels 
are not of Hilbert-Schmidt type. Hence, the infinite-bandwidth capacity expressions found in [24], 
[25] cannot be obtained from our upper and lower bounds simply by an appropriate choice of the 
scattering function Cw(v, r) and of the grid parameters T and F. 



2) Comparison with [26]: For scattering functions that are flat in the delay domain [see (|60|)], 
the discrete correlation function Ra[k, n] of our channel is given by 



Rm[k,n] = J J C m (v,T)e^ kT »- nFT Urdv 

V 

If we replace F by l/(2r ), we obtain 

R u [k,n} = 6[n] J q u (v)e^ kT »dv. 



Hence, for scattering functions that satisfy (60), and for F = 1/ (2r ), the discrete channel h[k, n] is 



uncorrected in frequency n. Consequently, the input-output relation (21 ) reduces to the input-output 
relation of N parallel i.i.d. flat fading channels that are selective in time. However, as both the 
average power constraint and the peak constraint are imposed on the overall channel and not on 
each parallel channel separately, the infinite-bandwidth capacity ( |6"T| ) does not follow simply from 
the capacity per unit energy of one of the parallel channels obtained in [26] . 

V. Conclusions 

The underspread Gaussian WSSUS channel with a peak constraint on the input signal is a fairly 
accurate and general model for wireless channels. Despite the model's mathematical elegance and 
simplicity, it appears to be difficult to compute the corresponding capacity. To nonetheless study 
capacity as a function of bandwidth, we have taken a three- step approach: we first approximated the 
kernel of the continuous-time WSSUS channel by a kernel that can be diagonalized, and obtained 
an equivalent discretized channel; in a second step, we derived upper and lower bounds on the 
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capacity of this discretized channel, and in a third step we expressed these bounds in terms of the 
scattering function of the original continuous-time WSSUS channel. In Section [D] and Appendix [A] 
we partially characterize the approximation error that arises when the original continuous-time 
underspread WSSUS channel operator is replaced by a normal operator whose eigenfunctions are 
a Weyl-Heisenberg set. A complete characterization of the approximation error would require to 
quantify the difference between the null spaces and between the range spaces of the original operator 
and its approximation. This characterization is a fundamental open problem, even for deterministic 
operators. 

The capacity bounds derived in this paper are explicit in the channel's scattering function, a 
quantity that can be obtained from channel measurements. Furthermore, the capacity bounds 
may serve as an efficient design tool even when the scattering function is not known completely, 
and the channel is only characterized coarsely by its maximum delay r and maximum Doppler 
shift vq. In particular, one can assume that the scattering function is brick-shaped within its support 
area [—v , vq\ x [— t , t ] and evaluate the corresponding bounds. As shown in Section III-B3b a 
brick-shaped scattering function results in the lowest upper bound for given t and z/ . Furthermore, 
the bounds are particularly easy to evaluate for brick-shaped scattering functions and result in 
analytical expressions explicit in the channel spread A H . Extensions of the capacity bounds for 
input signals subject to a peak constraint in time and frequency to the case of spatially correlated 
MIMO channels are provided in [68]. 

The multivariate discrete-time channel model considered in this paper, y[k] — h[k] x[/c] + w[k), 
and the corresponding capacity bounds are also of interest in their own right, without the connection 
to the underlying WSSUS channel. The individual elements of the vector h[k] do not necessarily need 
to be interpreted as discrete frequency slots; for example, the block-fading model with correlation 
across blocks in [69] can be cast into the form of our multivariate discrete-time model as well. 

As our model is a generalization of the time- selective, frequency-flat channel model, it is not 
surprising that the structure of our bounds for the case of a peak constraint both in time and frequency, 
and a peak constraint in time only, is similar to the corresponding results in [27], [28] and [24]-[26], 
respectively. The key difference between our proofs and the proofs in [26], [28], [24] is that our 
derivation of the upper bounds (29) and ( [58] ) (see Appendix [B] and Appendix [TJ respectively) is 



based on the relation between mutual information and MMSE described in [35]. Compared to the 
proof in [26, Sec. VI], our approach has the advantage that it can easily be generalized to multiple 

April 10, 2008 DRAFT 



37 



dimensions — in our case time and frequency — and provides the new lower bound ( |73| ). 

Numerical evaluation indicates that our bounds are surprisingly accurate over a large range of 
bandwidth. For small bandwidth and hence high SNR, however, our bounds are no longer tight, 
and a refined analysis along the lines of [5], [70] is called for. In the time- selective frequency-flat 
case, it was shown in [5] that the high-SNR capacity behavior depends heavily on the spectral 
density of the channel process. In particular, if the spectral density is zero on a set of positive 
measure, capacity grows logarithmically in SNR, otherwise the growth is slower, and can even 
be double-logarithmic. For the more general time- and frequency- selective channel considered 
in this paper, the assumption that the scattering function is compactly supported implies that the 



matrix-valued spectral density (40) of the multivariate discrete-time process is zero on a set of 
positive measure whenever T < l/(2z/ ). This implies that the capacity of the approximating 
channel operator grows logarithmically at high SNR [70] whenever the sampling rate in time is 
strictly larger than the Nyquist rate. The high-SNR behavior of the capacity of the original channel 
operator might be different, though. In the approximating discrete-time discrete-frequency input- 



output relation ( fTT| ), ISI and ICI are neglected [see (12)]. But the high-SNR behavior of a fading 



channel is heavily influenced by ISI and ICI, as recently shown in [71]. 



The approximate kernel diagonalization presented in Section II-B1 can be extended to WSSUS 
channels with non-compactly supported scattering function, as long as the area of the effective support 
of the scattering function is small [72]. The capacity bounds corresponding to a non-compactly 
supported scattering function are, however, more difficult to evaluate numerically, because the 



periodic repetitions of the scattering function in ( 16) fall inside the integration region. 



A challenging open problem is to characterize the capacity behavior of overspread channels, i.e., 
channels with spread Ae > 1. The major difficulty resides in the fact that a set of deterministic 
eigenfunctions can no longer be used to diagonalize the random kernel of the channel. 

Appendix A 

A. Approximate Eigenfunctions and Eigenvalues of the Channel Operator 



The construction of the approximating channel operator in Section II-B 1 relies on the following 
two properties of underspread operators: 

• Time and frequency shifts of a time- and frequency-localized prototype signal g{t) matched to 
the channel's scattering function Ch(V, r), are approximate eigenfunctions of HI. 
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• Samples of the time-varying transfer function L m (t, /) are the corresponding approximate 
eigenvalues. 

In this appendix, we make these claims more precise and give bounds on the mean- square ap- 
proximation error — averaged with respect to the channel's realizations — for both approximate 
eigenf unctions and eigenvalues. The results presented in the remainder of this appendix are not 
novel, as they already appeared elsewhere, sometimes in different form [20], [72], [56], [42]; the 
goal of this appendix is rather to provide a self-contained exposition. 

1) Ambiguity function: The design problem for g(t) can be restated in terms of its ambiguity 
function A g (u, r), which is defined as [73] 

A g (y,r) = J g{t)g\t-r)e-^ vt dt. 
t 

Without loss of generality, we can assume that g(t) is normalized, so that A g (0, 0) = \\g\\ 2 = 1. For 
two signals g(t) and f(t), the cross-ambiguity function is defined as 

A gJ (v,T) = I g(t)r(t-r)e-^dt. 
t 

The following properties of the (cross-) ambiguity function are important in our context: 

Property 1: The volume under the so-called ambiguity surface \A g (u, r)| 2 is constant [74]. In 
particular, if g(t) has unit energy, then 

J I \A g (is,r)\ 2 drdv = 1. 

V T 

Property 2: The ambiguity surface attains its maximum magnitude at the origin: \A g (u, r)| 2 < 
[A,(0, 0)] 2 = 1, for all v and r. This property follows from the Cauchy-Schwarz inequality, as 
shown in [55]. 

Property 3: The cross-ambiguity function between the two time- and frequency-shifted signals 

ff(a,/?)(0 = 9(t - a)* 2 ** and g {a .,p>){t) = g(t - «V 2 ^ is given by 

t 

(±) e 32^'r e -j2^ + p'-p)a J g (t>)g*(t' - (a' -a)- r)e^ 2 ^' dt' (62) 

V 

= A g {v + f3>-p, T + a'- a ) e -jM™-TP') e -j2n(P>-P)a 
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where (a) follows from the change of variables i! = t — a. As a direct consequence of d62l), we have 



A 



(v,r)=A g (v,T)e-^ a -^\ 



(63) 



Property 4: Let the unit-energy signal g(i) have Fourier transform G(f), and denote by T and F , 
defined as 



T 2 



t 2 \g(t)\ 2 dt, 



f\G(f)\ 2 df, 



(64) 



the effective duration and the effective bandwidth of g(t). Then T 2 and F 2 are proportional to the 
second-order derivatives of A g {v, r) at the point (z/, r) = (0, 0) [74] 



<9i/ 2 
a%(^,r) 



dr 2 



(i/,t)=(0,0) 



(^r)=(0,0) 



-47r 2 T 2 



-4tt 2 F 2 . 



Property 5: For the channel operator EI in Section II-A 




5 H (i/, r)p(t - T)e j27rt "f*(t)dTdvdt 



t T V 



J I S m (u, t) J f(t)g*(t - r)e-^dt 



drdv 



Sh(v, T)A* ftg (u, r)drdu = (S m ,A f , g ) 

V T 

where in (a) we used (|5]). 

Properties [T] and [2j which constitute the radar uncertainty principle, imply that it is not possible 
to find a signal g{t) with a corresponding ambiguity function A g (u,r) that is arbitrarily well 
concentrated in v and r [74]. The radar uncertainty principle is a manifestation of the classical 
Heisenberg uncertainty principle, which states that the effective duration T and the effective 
bandwidth F [both defined in of any signal in C 2 satisfy T F > l/(4vr) [55, Th. 2.2.1]. In 
fact, when g(t) has effective duration T , and effective bandwidth F , the corresponding ambiguity 
function A g (v, r) is highly concentrated on a rectangle of area 4T F ; but this area cannot be made 
arbitrarily small. 
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2 ) Approximate Eigenf unctions: 

Lemma 9 ( [20, Ch. 4.6.1]): Let Elbe a WSSUS channel with scattering function Ch(V, t). Then, 
for any unit-energy signal g(t), the mean-square approximation error incurred by assuming that g(t) 
is an eigenfunction of EI is given by 

e 1 =E[\\(mg,g)g-m g \\ 2 ] = JJ C u (u,r)(l-\A g (u,r)\ 2 )drdu. (65) 



Proof: We decompose e 1 as follows: 

E[\\{Mg,g)g-Wg\\ 2 ] = E [|| (Mg, g)g\\ 2 ] -h IE [||IHI ^|| 2 ] - 2E[|(H^>| 2 ] 

= E[||e^|| 2 ] -E[|(e^>| 2 ] . 

Here, the last steps follows because g(t) has unit energy by assumption. We now compute the two 



(66) 



terms in ( |66| ) separately. The first term is equal to 

E[||e#|| 2 ] ( = } e 

(6) 



J J S m (u,r)g(t-T)e^drdu 

V T 

Ce(^, t) / g{t — r)g*(t — r)dtdrdu 



2 -i 

dt 



(67) 



(c) 



where (a) follows from (|5]), (b) from the WSSUS property, and (c) from the energy normalization 
of g{t). For the second term we have 



E[|(H^>| 2 ] ®E[\{S m ,A g )\ 2 ] =E 



J J S m (v,T)A* g (is, r)drdv 



(68) 



Ch(^,t) \A g (v,r)\ drdu 



where (a) follows from Property [5] and (b) follows from the WSSUS property. To conclude the 
proof, we substitute d67j) and ([68]) in ([66]). ■ 



The error t\ in ([65]) is minimized if g(t) is chosen so that A g (u, t) w A g (0, 0) = 1 over the 
support of the scattering function. If the channel is highly underspread, we can replace A g (u, r) 



on the RHS of ( [65] ) with its second-order Taylor series expansion around the point (z/, r) = (0, 0); 
Property [4] now shows that good time and frequency localization of g(t) is necessary for e± to be 
small. If g(t) is taken to be real and even, the second-order Taylor series expansion of A g (u, r) 
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around the point (u, r) = (0, 0) takes on a particularly simple form because the first-order term is 
zero, and we can approximate A g (v, r) around (0, 0) as follows [74]: 

A g (u, r) « 1 - 2tt [T V + F V - jut/ (An)] . 

Hence, when g(t) is real and even, good time and frequency localization of g[t) is also sufficient 
for ei to be small. 

3) Approximate Eigenvalues: 

Lemma 10([72], [42]): Let EI be a WSSUS channel with time-varying transfer function La(t, f) 
and scattering function C H (z/, T). Then, for any unit-energy signal g( a ,p){t) — g(t — a)e j2nl3t , 
the mean-square approximation error incurred by assuming that Lh(ck, 0) is an eigenvalue of EI 
associated to g( a ,p) (t) is given by 



e 2 =E 



Cm(v,t)\1 - A 9 (v,t)\ drdv. 



Proof: We use Property [5] and the Fourier transform relation Q to write e 2 as 

2- 

3 j27r(ra-r/3) 



e 2 =E 



9(o,/3) V ' ^ 



drdv 



(a) 



(b) 



E 



y ^ S'e(z/,r)e^^-^ [^(z/,r) - l] drdi 



C m (u, t) |1 - A ff (z/, r)| 2 circii/. 



(69) 



Here, (a) follows from ( |63[ ) and (b) is a consequence of the WSSUS property. ■ 
Similarly to what was stated for e 1 in the previous section, also in this case good time and 
frequency localization of g(t) leads to small mean-square error e 2 if the channel is underspread. 



B. OFDM Pulse Design for Minimum ISI and ICI 



In Section II-B3 we introduced the concept of a PS-OFDM system that uses an orthonormal 
Weyl-Heisenberg transmission set {gk,n(t)}, where gk, n (t) = g(t — kT)e^ 2lTnFt : , and provided the 



criterion ( [13] ) for the choice of the grid parameters T and F to jointly minimize ISI and ICI. In 
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this section, we detail the derivation that leads to < [T3] ). Let r(t) = (Mx)(t) denote the noise-free 
channel output when the channel input x(t) is a PS -OFDM signal given by 

oo oo 

x(t) = ^2 X [kM9k t n(t)- 
k=—oc n=—oo 

For mathematical convenience, we consider the case of an infinite time and frequency horizon, and 
assume that the input symbols {x[k, n}} are i.i.d., with zero mean and E [\x[k, n]| 2 ] < 1, n. 

We want to quantify the mean-square error incurred by assuming that the projection of the received 
signal r(t) onto the function g k , n (t) equals x[k, n]L m (kT, nF), i.e., the error 

e 3 = E [| (r, g k)n ) - x[k, n}L m (kT, nF)\ 2 ] 

where the expectation is over the channel realizations and the input symbols. We bound £3 as follows: 



e 3 = E 



(a) 



\(r,9k,n) - x[k,n](Mg k)n ,g ktn ) 

+ ^[^^((I^,^) - L m (kT,nF)) 



< 2E[\(r,g k , n ) - x[k,n](Ug k . n , g k , n )\ ] 



C4 



+ 2E 



\x[k,n)({Ug k>n ,g k!n ) - L B (kT,nF)) 



2e 4 + 2 E [\x[k, n}\ 2 ] E | (H <? M , g Kn ) - L m (kT, nF) \ 



C-2 



< 2e 4 + 2e 2 

1 2 1 1 2 1 1 2 

where (a) holds because for any two complex numbers u and v we have that | u + v \ < 2 | u\ + 2 | v \ . 



The error e 2 is the same as the one computed in Lemma 10 The error e 4 results from neglecting ISI 
and ICI and can be bounded as follows: 

e 4 = E [I (r, g k , n )\ 2 ] + E [\x[k, n}\ 2 ] E [| (H g k>n , g k , n )\ 2 } 
- 2K{E [x* [k, n] (r, g Kn ) (H g k>n , g k>n )*)} 



(a) 



J2 J2 ^\\4k',n'}\ 2 }^[mgk',n',gk,n)\ 2 ] 



k'=— oo ti'=— 00 
(k' ,n')^(k,n) 

oo 00 

gk',n'i9k,nl\ \ 

k'=— oo n'=— oo 
(fc',n')^(fe,n) 



(70) 



(6) 
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where (a) follows because the x[k, n] are i.i.d. and zero mean, and (b) because E [\x[k, n] | 2 ] < 1. 
We now provide an expression for E [|(H gk>, n ', 9k,n)\ 2 ] that is explicit in the channel's scattering 
function: 



E[|(e^>^,n>l 2 ] =e 

(6) 



(Sm, Ag h ., g ,, ,) 



drdv 



(71) 



C m (u,t) \A g (u + (n' -n)F,T+(k' - k)T)\ 2 drdv 

C m (u - {n' - n)F, r - (k' - k)T) \A g (v, r)| 2 drdv. 

Here, (a) follows from Property [5} (b) from the WSSUS property, and (c) from Property 3. We 
finally substitute dTTb in d7(l and obtain 



k'=—oo n'=— oo 
{k' ,n')^(k,n) 

oo oo 



e,<J2 E C M (v-(n'-n)F,T-(k'-k)T)\A g (v,T)\ 2 dTdv 



(72) 

= J2 E [f C m (u-nF,T-kT)\A g (u,r)\ 2 drdu. 

k=— oo n=— oo 
(fc,n)^(0,0) 

This error is small if the ambiguity surface |A g (f, r)| 2 of p(t) takes on small values on the periodically 
repeated rectangles [— 1/ + wF, + nF\ x [— r + kT, r + fcT], except for the dashed rectangle 
centered at the origin (see Fig. [2]). This condition can be satisfied if the channel is highly underspread 
and if the grid parameters T and F are chosen such that the solid rectangle centered at the origin 
in Fig.|2]has large enough area to allow \A g (u, r)| 2 to decay. If g(t) has effective duration T and 
effective bandwidth F , the latter condition holds if T > r + T , and F > u + F . Given a 
constraint on the product TF, good localization of g(t), both in time and frequency, is necessary 
for the two inequalities above to hold. 



The minimization of £4 in (72) over all orthonormal Weyl-Heisenberg sets {gk,n(t)} is a difficult 
task; numerical methods to minimize e 4 are described in [58]. The simple rule on how to choose the 
grid parameters T and F provided in ( [13] ) is derived from the following observation: for known t 
and u , and for a fixed product TF, the area 4(T — Tq)(F — u ) of the solid rectangle centered at 
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which the ambiguity function A g [y, r) should be concentrated to minimize £4 is shaded in grey. 

the origin in Fig.[2]is maximized if [20], [56], [58] 

T = To 
F u ' 

Appendix B 

Lemma 11: Let be a stationary random process with correlation function 

r h [k] =E[h[k' + k]h*[k'}] 

and spectral density 

oo 

c h {B) = r h [k]e-^ ke , \6\ < 1/2. 
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Furthermore, let h = [h[0] h[l] ... h[K — 1]] T , and denote the K x K covariance matrix of h 
by Rh = IE [hh^] . This covariance matrix is Hermitian Toeplitz with entries [Rh]j = ru[i — j). 
Then, for any deterministic ^-dimensional vector x with binary entries {0, 1} and for any p > 0, 
the following inequality holds: 

1/2 

inf log det (I K + p(xx H ) R h ) > J log(l + pc h (9))d9. (73) 

-1/2 

Furthermore, in the limit K — > oo, the above inequality is satisfied with equality if the entries of x 
are all equal to 1. 

Remark 1: The second statement in Lemma [H] — that the infimum can be achieved by an all-1 
vector in the limit for K — > oo — was already proved in [26, Sec. VLB]. The proof in [26] relies 
on rather technical set-theoretic arguments, so that it is not easy to see how the structure of the 
problem — the stationarity of the process {/i[A;]} — comes into play. Therefore, it is cumbersome to 
extend the proof in [26] to accommodate two-dimensional stationary processes as used in this paper. 
Here, we provide an alternative proof that is significantly shorter, explicitly uses the stationarity 



property, can be directly generalized to two-dimensional stationary processes (see Corollary 13 



below), and yields the new lower bound f73[ ) as an important additional result. 

Our proof is based on the relation between mutual information MMSE discovered recently by 
Guo et al. [35]. In the following lemma, we restate, for convenience, the mutual information-MMSE 
relation for JPG random vectors^] 

Lemma 12: Let h be a 7^-dimensional random vector that satisfies E[||h|| 2 ] < oo, and let w be 
a zero-mean JPG vector, w ~ C7V(0, Ik), that is independent of h. Then, for any deterministic 
fT-dimensional vector x, 

^/(^x0h + w;h) =E[||x©h-x©E[h| y/yxQh + w]\\ 2 ] . (74) 



The expression on the RHS in (74) is the MMSE obtained when x h is estimated from the noisy 
observation v /7x © h + w. 



Proof of Lemma 11 • We first derive the lower bound (73 ) and then show achievability in the 



limit K — > oo in a second step. To apply Lemma [121 we rewrite the LHS of (73 ) as 



^ log det (I K + p(xx H ) R h ) = ^^/(^x h + w; h) (75) 



12 For a proof of Lemma 



12 



see [35, Sec. V.D]. 
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where w ~ C7V(0, is a JPG vector. Without loss of generality, we assume that the vector x has 

exactly M nonzero entries, with corresponding indices in the set M.. Then, 
1 

■—- J(^px0h + w;h) = 



(«) 



(J 

M 



^y"E[||x0h-x0E[h| v^xOh + w])! 2 ] d 7 

i 

E\\h[m] -E[h[m] \ {^h[k] + w[k]} keM ] 



meM 



dj 



(76) 



(«0 1 
> 

- M 



E [|/i[m] - E [/i[m] | {^[A;] + w[k]}^_J \ 2 } ^ 



E 



1^(0] - e [^[0] | + «[*]}^_ J T dr- 



Here, (a) follows from the relation between mutual information and MMSE in Lemma [12] in the 
form given in [35, Eq. (47)]. Equality (b) holds because x has exactly M nonzero entries with 
corresponding indices in M., and because the components of the observation that contain only 
noise do not influence the estimation error. The argument underlying inequality (c) is that the 
MMSE can only decrease if each h[m] is estimated not just from a finite set of noisy observations 
of the random process {/&[&]}, but also from noisy observations of the process' infinite past and 
future. This is the so-called infinite-horizon noncausal MMSE. Finally, we obtain (d) because the 
process is stationary and its infinite horizon noncausal MMSE is, therefore, the same for all 

indices meM [75, Sec. V.D.I]. 

The infinite-horizon noncausal MMSE can be expressed in terms of the spectral density of the 
process {h[k}} [75, Eq. (V.D.28)]: 

1/2 



E 



\h[0}-E[h[0}\{^h[k}+w[k}}Z„J 



c h (6) 



-1/2 



l+7C/»(0) 



dd. 



(77) 



To obtain the desired inequality (73 1, we substitute (77 ) in (76), and (76) in (75 1, and note that the 
resulting lower bound does not depend on x. We have therefore established a lower bound on the 
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LHS of f73~| ) as well. We finally integrate over 7 and get 

1/2 P 

inf log det (Ijf + p(xx H ) R h ) > J J - 



Ch{6) -d 7 d9 



+ ic h (6) 



1/2 
1/2 

log(l + pc fc (0))d0. 

-1/2 



To prove the second statement in Lemma 1 1 we choose x in ( |75| ) to be the all-1 vector for any 
dimension K, and evaluate the limit K — » 00 of the LHS of f75| ) by means of Szego's theorem on 

the asymptotic eigenvalue distribution of a Toeplitz matrix [31], [32]: 

1/2 

lim -bogdet(Ix + /?R h ) = [ \og(l + pc h (6))d6. (78) 

K— >oo A J 



-1/2 



This shows that the lower bound in f73| ) can indeed be achieved in the limit K — > 00 when x is the 
aU-1 vector. ■ 
Our proof allows for a simple generalization of Lemma[TT]to two-dimensional stationary processes, 
which are relevant to the problem considered in this paper. The generalization is stated in the 
following corollary. 

Corollary 13: Let {h[k, n}} be a random process that is stationary in k and n with two-dimensional 
correlation function r h [k,n] = ~E[h[k + k', n + n']h*[k', n'}} and two-dimensional spectral density 

c h {e,<p) =J2J2 r h [k,n}e-^ k9 -^\ \6\ , \<p\ < 1/2. 

fc=— 00 n=— 00 

Furthermore, let h[k] = [h[k, 0] h[k, 1] ■ • • h[k, N - 1]] T , let the KA^-dimensional stacked vec- 
torh= [h T [0]h T [l] ... h T [K - 1]] T , and denote the KN x KN covariance matrix of h by R h = 
E [hh^] . This covariance matrix is a two-level Toeplitz matrix. Then, for any AW-dimensional 
vector x with binary entries {0, 1} and for any p > 0, the following inequality holds: 

1/2 1/2 

inf ^logdet(l^ iV + p(xx H )0R h ) > J j log(l + pc h {6, ip))d6dip. (79) 

-1/2 -1/2 

Furthermore, in the limit K, N — > 00, the above inequality is satisfied with equality if the entries 
of x are all equal to I. 



April 10, 2008 



DRAFT 



48 



Proof: Without loss of generality, we assume that the vector x has exactly M nonzero elements, 
with corresponding indices in the set M.. The arguments used in the proof of Lemma [TT] directly 
apply, and we obtain 

1 . . . H , 



log det (I KN + p(xx^) R h ) > 



E 



h[0,0]-K h[0,0] | {<fyh[k,n] +w[k,n}} 



oo 

k,n=— oo 



To complete the proof, we use the two-dimensional counterpart of (77 ) — the closed-form expression 
for the two-dimensional noncausal MMSE [76, Eq. (2.6)] — and we compute the two-dimensional 
equivalent of f78j ) by means of the extension of Szego's theorem to two-level Toeplitz matrices 
provided, e.g., in [33]. ■ 



Appendix C 

In this appendix, we show that a sufficient condition for 

f W ( 1 I 

a(W) = min< 1 



with A(W) defined in ( [29c] ), is that 



TF \A(W) P 



(80) 



0<^<^, and Ae< 3// , 



or that 



TF < W < 



exp 



- 1 



2TFA t 

For notational convenience, we set p = P/W . The necessary and sufficient condition under 



which ( 80 ) holds can be restated as 



W 1 _ 
> - + TF 



A(W) ~ p 



or, equivalently, as 



^JJ log(l + pPC m (u, r)) drdu < (~ p +TF 



(81) 



We now use Jensen's inequality as in ( |39| ) to upper-bound the LHS of ( |8T| ) and get the following 
sufficient condition for a(W) = 1: 



T los l 1 + A^ 



< ( - + TF 



(82) 



We next distinguish between two cases: p > 1/(TF) and p < 1/(TF). 
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Case p > l/(TF): We use the inequality 

J> + 



TF < 2TF 



to lower-bound the RHS of ( [82] ) and obtain the following sufficient condition for a(W) = 1: 

Ah. f,,0p\ ■ 1 

it 1oe 1 1 + ^ 

This condition can be expressed in terms of p as 



< 



2TF 



P < 



Ae 



exp 



2TFA* 



1 



(83) 



Case p < (1/TF): We further upper-bound the LHS of ( 821 ) by means of the inequality 

for all x > 



1, . , 1 
-log(l + x) < — , 

X y/1 + X 



and get the following sufficient condition for ct(W) = 1: 



7 < I - + TF 

+ /3p/A e VP 

This condition is satisfied for all p G [0,1/ (TF)] as long as 

A e < P/(3TF). 



If we combine (83 ) and (84), the sufficient condition (37 ) follows. 



(84) 



Appendix D 
Proof of Lemma[3] 

1 ) Upper bound: We restate the penalty term in (41 ) in the more convenient formp] 



1/2 

i J logdet^ + ^C^J d9. 

-1/2 



(85) 



We seek an upper bound on ( [85] ) that can be evaluated efficiently, even for large N, and that is tight in 
the limit iV — » oo. To obtain such a bound, we need to solve two problems: first, the eigenvalues of 



the N x N Toeplitz matrix C (6) are difficult to compute; second, the determinant expression in ( [85] ) 
needs to be evaluated for all 9 E [—1/2, 1/2]. To upper-bound ( [85] ), we will replace C(9) with a 
suitable circulant matrix that is asymptotically equivalent [32] to C(9). Asymptotic equivalence 



13 For simplicity and without loss of generality, we set 7 = 1. 
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guarantees tightness of the resulting bound in the limit N — > oo. As the eigenvalues of a circulant 
matrix can be computed efficiently via the discrete Fourier transform (DFT), the first problem is 
solved. To solve the second problem, we use Jensen's inequality. 

We shall need the following result on the asymptotic equivalence between Toeplitz and circulant 
matrices. 

Lemma 14 (see [77]): Let T be an iV x iV Hermitian Toeplitz matrix. Furthermore, let F be the 
DFT matrix, i.e., the matrix F = [f fi • • • fjv-i] whose columns f n = [(3 0n [3 ln ■ ■ ■ P^-^f/^N 
contain powers of the iVth root of unity, (3 = e^ 2lT ^ N . Construct from the matrix F H TF the diagonal 
matrix D so that the entries on the main diagonal of D and on the main diagonal of F^TF are 
equal. Then, T and the circulant matrix FDF H are asymptotically equivalent, i.e., the Frobenius 
norm [64, Sec. 5.6] of the matrix (T — FDF H ) / a//V converges to zero as iV — > oo. 

Our goal is to upper-bound a function of the form log det(lAr + T/N). Because F is unitary, and 
by Hadamard's inequality, 



logdet(I JV + -T 



lo,, dot ( I N + -F H TF 



<logdet(I^ + -D 



l^dH-(I iV + -FDF if 



(86) 



Since T and FDF are asymptotically equivalent, we expect the difference between the LHS and 



the RHS of the inequality ( |86| ) to vanish as N grows large. We formalize this result in the following 
lemma, which follows directly from Szego's theorem on the asymptotic eigenvalue distribution of 
Toeplitz matrices. 

Lemma 15: Let {t n } be a sequence that satisfies t_ n = t* n for all n, and has Fourier transform 



-j2Trn(p 



< 1/2. 



Let T be the N x N Hermitian Toeplitz matrix constructed as 



k 



t-1 

to 



t. 



(AT-l) 



-(N-2) 



(87) 



tN-l tN- 
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Then, the function logdet(lAr + T/N) admits the following Lth-order Taylor series expansion 
around the point 1/N = 0: 



logdet(Lv + -T 



1/2 



E 

1=0 



{I + l)N l 



s(ip)] l+1 dip + o 



1 

N 1 



(88) 



-1/2 



Furthermore, let F and D be as in Lemma 



14 



Then, logdet(Lv + FBF H /N) has the same Lth- 
order Taylor series expansion around 1/N = as logdet(lAr + T/N). 

Proof: Let p be the essential supremum of s((p), i.e., p is the smallest number that satis- 



fies s(ip) < p for all ip, except on a set of measure zero. Then for any N, the eigenvalues {A n } n 
of the matrix T satisfy X n < p [32, Lemma 6]. We now use the expansion in power series 

log(l + x) 



N-l 
=0 



00 (-1V+ 1 

} j - — j — x l , for \x\ < 1 



i=i 



to rewrite f(l/N) = logdet(I iV + T/N) as 

N-l , > x N-l oo 



n=0 



-1) 



l+l 



71=0 1 = 1 

OO 



iV 



E 

i=i 



(-l) m 1 
I iV^ 1 



1 



iV 



n=0 



for jV > p. (89) 



To compute the Taylor series expansion of / (1/N) around 1/N = we need to evaluate f(l/N) 
and its derivatives for N — > oo. We observe that Szego's theorem on the asymptotic eigenvalue 
distribution of Toeplitz matrices implies that [32, Th. 9] 



N-l 1/2 
n=0 _( /2 



dip. 



(90) 



Consequently, it follows from ( 89 ) that 



f(0) = lim f(l/N) 

N— >oo 



1/2 



s(<p)d(p, 



-1/2 



/'(O) = lim iV[/(l/iV) - f(0)} 

N— >oo 



1 

2 



1/2 



[s(<^)] 2 ^, 



-1/2 
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and, for the Zth derivative, 



f (l) (0) = lim l\ N 

N—*oo 



l-l 



/(l/iV)-/(0)-^i!^/ (i) (0) 



1/2 



l + l 



-1/2 



The proof of the first statement in Lemma 15 is therefore concluded. The second statement follows 



directly from the asymptotic equivalence between T and FDF H (see Lemma 14) and from [32, 
Th. 2]. 



To apply the bound (86) to our problem of upper-bounding the penalty term (85 ), we need to 
compute the diagonal entries of F H C(9)F. Similarly to ( [87] ), we denote the entries of the power 
spectral density Toeplitz matrix C(9) as {c n (9)}n=-(N-i)- ^ s a consequence of ( [19] ) and ( |40[ ), C(9) 
is Hermitian, i.e., c_ n (#) = c* (#). Furthermore, again by ( fT9] ) and ( |40] ), each entry c n (9) is related 
to the discrete-time discrete-frequency correlation function Ru[k, n) according to 



C n {9) 



(a) 1_ 

T 



Mk 

x> 
oo 

E 



k=— oo 

oo 



(J ^ 

" T 



E 



n e 



Cm 



6-k 
6-k 



T e- j2 ™ FT dT 



(91) 



t e- j27rnFT dr 



-TO 



where (a) follows from the Fourier transform relation ([6]), and the Poisson summation formula as 



in ([16]), and in (b) we used that Ch(v, r) is zero outside [—To, To). Consequently, the ith element on 
the main diagonal of F H C(9)F, which we denote as di(9), can be expressed as a function of the 
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entries of C (9) as follows 



VP 



N-1N-1 

p=0 q=0 

N-lN-l 

p=0 g=0 
1 JV_1 

= JV= E (N - \n\)c n (6)e-^ 

n=-(N-l) 
, iV-1 



(92) 



iV 



^(iV-n)c n (0) e -^f -c (£) 



?1=0 



where we set n = q — p and used c_ n (6 l ) = c* (#). We can now establish an upper bound on the 



penalty term (85 ) in terms of the {di(6)} on the basis of (86 ): 

1/2 1/2 



i j logdet(ltf + ^C(60) d6=^ J logdet^ + ^F^C^F^) ^ 

-1/2 -1/2 



-1/2 
1/2 



<i / ^logM + ^d^))^ 
; /„ »=o ^ ' ' 



(a) 



(6) 



-1/2 
V(2T) ^ 



(93) 



/ j2 lo 4 1+F w di( - uT) ) du 

y. n -. i=o ^ ' 



-1/(2T) 
1/0 2V-.I 



i 1=0 



where (a) follows from the change of variables v = 9/T and (b) holds because Ch(v, r) is zero 
for outside [— v , u ], and because, by assumption T < 1/ (2z/ ), so that C*e(^ — fc/T, r) is zero 
whenever k ^ 0; hence, by ( |9T| ) and ( |92] ), also c n (z/T) and di(vT) are zero for y outside [— z/ , ^o]- 
We proceed to remove the dependence on v. To this end, we further upper-bound ([93]) by means 



of Jensen's inequality and obtain the desired upper bound in (48); 



JN-I , v N-l 

/ lo g( 1 + -J^MvT) J dx/ < 2z/ ^ lo, 

,. i=Q ^ ' i=0 



'■'(I 



1 + 2!^ / 



(94) 



"-1 / p \ 
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where we set di = T di{vT)dv. As we have by (|9T|) that 

k 



T J c n (uT)du= J2 j j cJu-^r\e-^ nF ^drdv 

k=—oo 




C n (v,T)e- j27vnFT dTdv 



-V<3 -T 



it follows from (92) that 



Ra[0,n], 



N-l 



- 1 



n=0 



as defined in (47). 



As a consequence of Lemma 15 the penalty term (85j) and its upper bound in ( |93| ) have the same 
Taylor series expansion around the point 1/N = 0, while the upper bound on the penalty term given 



on the RHS of ( [94] ) has the same Taylor series expansion around the point 1 /N = as ( [85] ) only 



when the Jensen penalty in ( [94] ) is zero. This happens for scattering functions that are flat in the 
Doppler domain, or, equivalently, that satisfy ([49]). 

We next provide an explicit expression for the Taylor series expansion of the penalty term ( [85] ) 
around 1/N = 0; this expression will be needed in the next section, as well as in Appendix [F] As 
the Fourier transform Yl^-oo c n [Q)e? 2 ' KnLp of the sequence {c n (9)} is the two-dimensional power 



spectral density c(9, ip) defined in ( [T5] ), we have by Lemma 15 that 

1/2 L ■ ■■ 1/2 1/2 

T 



1 

T 



-1/2 



L i 1/2 1/2 



+ o 



1 

N 1 



-1/2 -1/2 



1=0 



l + l \NF 



[C M (u,r)} l+l drdu + o 



1 



(95) 



where in the last step we first used ( 16 ) and then proceeded as in ( 17 ). 
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2) Lower bound: To lower-bound the penalty term ([85]), we use Lemma [TT] in Appendix [B] for 



the case when x is an iV-dimensional vector with all-1 entries and obtain 

1/2 1/2 1/2 

PT 



i J logdet(l N + ^C(9)\d9>^ J J log + 

-1/2 -1/2-1/2 



N 



c(0, if) difd9 



-1/2 -1/2 

NF J J log( 1 



(96) 



PT 



Cm(v,t) ) drdu 



where in the last step we again first used ( fT6] ) and then proceeded as in ( fT7] ). We next show that the 



penalty term (85) and its lower bound ([96]) have the same Taylor series expansion [given in ([95])]. 
For any fixed (is, t) the function NF log(l + PTCm(v, t) / N) is nonnegative, and monotonically 
increasing in N. Hence, by the monotone convergence theorem [78, Th. 1 1.28], we can expand the 



logarithm inside the integral on the RHS of (96) into a Taylor series. The resulting Taylor series 



expansion coincides with the Taylor series expansion of ([85]) stated in (95 ). 



Appendix E 
Proof of LemmaH] 

To prove Lemma [4| we need to evaluate linivi/->oo WTJi(W), where Ui(W) is the upper bound 
in ( |2~9~| ). Our analysis is similar to the asymptotic analysis of an upper bound on capacity in [28, 
Prop. 2.1], with the main difference that we deal with a time- and frequency-selective channel 
whereas the channel analyzed in [28] is frequency flat. We start by computing the first-order Taylor 
series expansion of A(W) in ( |29c[ ) around 1/W = 0. This first-order Taylor series expansion 
follows directly from Appendix [D] and is given by: 

A(W) = — I I loft I I —C\(p.t) \ drdv 



W 

J 



1( >&( 1 + ^ C ' H ( I/ ' r - 



P 



PP 2 
2W 



C^(u, r)drdu + o 



w 



(97) 



We now use p7) to evaluate the minimum in (|29b|) 

1 1 

A(W) ~ P 



lim 

w->oc TF 



lim 

vi/^oo TF 



P-/3k m P 2 /(2W) + o(1/W) P 
„ , \ -1 



lim 

w-oo TFP \l - (3Pk m /(2W) + o(l/W) 



(98) 



w™oc TFP V 2W 



+ o 



W 



2TF 
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where we used the Taylor series expansion 1/(1 — x) = l + x+o(x)forx — > to obtain equality (a). 



Because a(W) is defined in (29b) as the minimum 

f W ( 1 1 
aW = min|l,— -- 

we need to distinguish two cases. 

• If (3 > 2TF/ku, we get lim H /_ +00 a(W) = 1, so that, for sufficiently large bandwidth, the 

upper bound ( |29a ) can be expressed as 

W ( TF\ 
U lW = -log^l + P-j -A(W) 

P 2 / 1 

(p Km -TF) + o' 



2W y! ' \W 

Consequently, we obtain the first-order Taylor series coefficient 

c = lim W\lx{W) = ^ (/3« H - TF) • 

VF— »oo Z 

If p < 2TF/k m , we get 

W ( 1 1 \ 

lim a(W) = lim — ; — 

w^oo y 1 w^oo TF \A(W) P J 

so that for sufficiently large bandwidth 

TF \ P 6 V p 

We now use the Taylor series x — log(l + x) = x 2 /2 + o(x 2 ) for x — > on the RHS of ( 100) 

to get 

w ^ ^ +0 my +fl m (101) 



(100) 



2TF V 2 ^ V W J J \W 

((3Pk m ) 2 , / 1 



8TFVT V W 



where (a) follows from the Taylor series expansion of A(VK) in (97). Hence, the first-order 
Taylor series coefficient of the upper bound Ui(W / ) is given by 



Both cases taken together yield (54). 
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Appendix F 
Proof of Lemma[5] 

To prove Lemma[5] we need to evaluate lirrivi^oo WLi(W), where Li(W) is the lower bound (41 ). 
The first term in ( [41) is the coherent mutual information of a scalar Rayleigh-fading channel with 
zero-mean constant-modulus input. This mutual information has the following first-order Taylor 
series expansion around 1/W = [14, Th. 14]: 



W 
7 TF 



I{y;x\h)=P 



^P 2 TF 
W 



+ o 



1 

W 



(102) 



We now analyze the second term in (41 ); its Taylor series expansion around 1/W = (for the 



case 7 = 1) is given in < |95j ). If we truncate this expansion to first order and take into account the 
factor 7, we obtain 

1/2 



1 

7T 



logdet ( I N + ^C(fl)) dB = P - ^ H + o(± 



(103) 



-1/2 



where is defined in ( [53] ). We then combine ( 102[ ) and ( 103 ) to get the desired result 

7 p2 TF -pL' /v _, / | 



lim WU(W) 

W^oo 



lim max W 

W^oo l<7</3 



P 



w 



P+^ + o, 

2W \W 



(3P 2 {k m /2-TF). 



Appendix G 
Proof of Theorem [6] 

To prove Theorem [6l we need to find a lower bound on C(W) whose first-order Taylor series 



expansion matches that of the upper bound Ui(W / ) given in (54). To obtain such a lower bound, 



we compute the mutual information for a specific input distribution that (slightly) generalizes the 
input distribution used in [28]. For a given time duration KT and bandwidth NF, we shall first 
specify the distribution of the input symbols that belong to a generic K' x N' rectangular block in 
the time-frequency plane, where K' and N are fixed and K' < K, N < N, and then describe the 
joint distribution of all input symbols in the overall K x N rectangle; transmission over the K x iV 
rectangle is denoted as a channel use. Within a K x N block, we use i.i.d. zero-mean constant- 
modulus signals. We arrange these signals in a K N' -dimensional vector d in the same way as 
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in (20), i.e., we stack first in frequency and then in time. Finally, we let the input vector for the 
K x N block be x = b d, where b is a binary RV with distribution 

^/fiPT/N, with probability £ 

0, with probability 1 — £ 

This means that the i.i.d. constant-modulus vector d undergoes on-off modulation with duty cycle £ 



The above signaling scheme satisfies the peak constraint ( 24 ) by construction. The covariance matrix 
of the input vector x is given by 

BPT 

E [5& H ] = E b [Ex [xx^ \b}}= C^Ik'n' 

so that for £ < 1/(3 the signaling scheme also satisfies the power constraint E[||x|| 2 ] < K N PT/N. 
In the remainder of this appendix we will assume that ( < 1//3. The input-output relation for the 
transmission of the K' x N' block can now be written as 

y = x h + w 

where the K' N' -dimensional stacked output vector y, the corresponding stacked channel vector h, 
and the stacked noise vector w are defined in the same way as the stacked input vector x. Finally, 
we define the correlation matrix of the channel vector h as = E hh H . 

Let now I = \K/K \ and m = [N/N' ' \ . In a channel use, we let the f^iV-dimensional input 
vector s with entries {s[k, n]} be constructed as follows: we use IK' ■ mN' out of the KN entries 
of s to form Im subvectors, each of dimension K N', and we leave the remaining KN — IK' ■ mN' 
entries unused. For p — 0, 1, . . . , I — 1 and q — 0, 1, . . . , m — 1, the (p, g)th subvector is constructed 
from the entries of s in the set {s[A;,n] : k = pK' ,pK' + 1, . . . , (p + l)K' — 1; n = qN' , qN' + 
l,...,{q+l)N -1}. Finally, we assume that the Im subvectors are independent and are distributed 
as x, so that 

E[||s|| 2 ] = ZmE[p|| 2 ] < ImK'N'PT/N < KPT. 



Hence, the vector s satisfies both the average power constraint and the peak constraint ( |24| ) in Sec- 
tion 



II-E Finally, we have 



C ^ W) = & ICT *f /(y ; X) " i^oo ]^T /(y 5 S) 

> lim ^J(y;5) d04) 

(b) m 

= ]^ J (y; x ) 
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where (a) follows from the chain rule of mutual information (the intermediate steps are detailed 
in [28, App. A]), and in (b) we used 

lim \- = lim — - = 

K^oo K K-*oa K K 



Because we are only interested in the asymptotic behavior of the lower bound ( |104| ), it suffices 
to analyze the second-order Taylor series expansion of J(y; x) around 1 /N = 0. As the entries of x 
are peak-constrained, and h is a proper complex vector, we can use the expansion derived in [79, 
Cor. 1] to obtairQ 



J(y;x) = -tr4Ex 



(hOx)(h©x) 
1 



H 



2 tr i 



(h x) (h x) 



H 



<n W2 ). (105) 



In the following, we analyze the two trace terms separately. 
The first term is: 



ti< E x 



(a) 



(*>) 



E s 



tr<|E x 



tr<! E x 



tr^E? 



(h0x)(h0x) 

(R S eK)) 2 

"(R s 0(5^)) W (R~ h 0(^) 
Rf((X*x T )©R K 0(^) 



(106) 



00 



Ctr<jRf I Rg E x 



(^©(SK*) 



f3PT 
N 



Here, (a) follows from (27 ), (b) follows because R^ and xx H are Hermitian and (c) follows from 
the identity [80, p. 42] 

^{(AOB^C} =tr{A H (B*0C)}. 
We obtain (d) as the Hadamard product is commutative and (e) holds because the entries of the 



matrix (x*x T ) ©(xx^) are all equal to (f3PT) 2 /N 2 w.p.l given that b = y//3PT/N. 



14 Differently from [79, Cor. 1], the Taylor series expansion is for N — > oo; furthermore, we have No = 1, and the SNR is given 
by KN'pT/N. 
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To evaluate the second trace term in ( |105[ ), we once more use the identity < [27| ): 

2 



(h x) (h x) 



H 



tW R fi0 



N 



K N 



(107) 



where the last equality follows because we normalized Ra[0, 0] = crjj = 1 (see Section II-D) 



Next, we substitute the trace terms ( 106) and ( 107) into the second-order expansion of mutual 
information in ( 105[ ), which, together with the lower bound in ( 104 ), results in the following lower 
bound on limyi/^oo WC(W), valid for any fixed K' and N : 



mNF 

Jim WC(W)> lim — — /(y;X) 

Vv— >oo N^oo J\ 1 



mNF 

lim ; — 

N-+oc 2K'T 



((f)'"(« 



N 2 



(108) 



\N- 



lim 



N 



TF 



TF 

— tr{RfR H }-TF 



7 tr{RfR £ } -N'TF 



(K'N 



where in the last step we used limA^oo m/N = lim^v^oo [N/N' J /N = 1/N . 

If we now take K' and N' sufficiently large, the RHS of ( |108[ ) can be made arbitrarily close to its 
limit for K — > oo and N — > oo. This limit admits a closed-form expression in Cm(v, t). In fact, 

X AT 



lim 



K ,N ->oo 



k=l n=l 



1/2 1/2 



(6) 



-1/2 -1/2 
(J 1 

TF 



[CH(^,T)] 2 <iT<iz/ . 



(109) 



-v - 



Here, (a) follows because R^ is Hermitian and its K N' eigenvalues {Xk,n} are rea l- The matrix R^ 
is two-level Toeplitz and its entries belong to the sequence {Ru[k, n]} with two-dimensional power 
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spectral density c(9, ip) defined in ( fl"5] ); then, (b) follows from the extension of ( [90] ) to two-level 



in (108) for A' 



Toeplitz matrices provided in [33] . Finally, to obtain (c) we proceed as in §Y1\ . If we now replace ( |109[ ) 
oo and N — > oo we obtain, 



lim lim WC{W) 



TF 



(HO) 



If we choose £ = 1/(3 whenever (3 > 2TF/km, and ( = Km/(2TF) otherwise, the limit ( |110[ ) 



equals the first-order Taylor series coefficient c of the upper bound Ui(W) in (54b). Hence, the 



first-order Taylor series expansion of the lower bound ( 108| ) can be made to match the first-order 



Taylor series expansion of the upper bound (29) as closely as desired. 



Appendix H 
Proof of Theorem [7] 

To obtain a lower bound on C^, we compute the rate achievable in the infinite-bandwidth limit 
for a specific signaling scheme. Similarly to the proof of Theorem [6] in Appendix [Gj it suffices 
to specify only the distribution of the input symbols that belong to a generic rectangular block 
in the time-frequency plane. Differently from Appendix [Gj we take the generic block to be of 
dimension A' x N, where A' is fixed and K' < A. We denote the input symbols in each time- 
frequency slot of the K' x N block as x[k, n] and arrange them in a vector where — differently from 
Section II-D — we first stack along time and then along frequency. The A' -dimensional vector that 



contains the input symbols in the nth frequency slot is defined as 

x[n] = <r[0, n) x[l, n] ■ ■ ■ x[K' - 1, n) 



i T 



and the A A-dimensional vector that contains all symbols in the block is 

X T [0] X T [1] ■•• 3t T [N-l] J 



(HI) 



We define the stacked channel vector h, the stacked noise vector w, and the stacked output vector y 
in a similar way. The input-output relation corresponding to the K x N block is 



y = x h + w. 



(112) 



Finally, we denote the correlation matrix of the channel vector h by R^; this matrix is again two- 
level Toeplitz. Within the A' x N block, we use a signaling scheme that is a generalization of the 
on-off FSK scheme proposed in [67], and can be viewed as FSK in the channel's eigenspace. 
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Fig. 3. Slots in the time-frequency plane occupied by the symbol X3 for the case K' — 4. 



Definition 16 (On-off Weyl-Heisenberg keying — OO-WHK): Let x« for i — 0, 1, . . . , N — 1 de- 
note a KiV-dimensional vector with entries Xi[k,n] that satisfy |xj[/c,n]| 2 = (3PT5[i — n}. We 
transmit each Xj with probability p = 1/ (N/3), for i = 0, 1, . . . , N — 1, and the all-zero K N- 
dimensional vector with probability 1 — 1/ (N/3). 

Fig. [3] shows the time-frequency slots occupied by the symbol X3 for K' = 4. Steps similar to 
the one detailed in Appendix [G] [see ( |104| )] yield the following lower bound on C^: 

Coo = Km lim sup-J-/(y;x) > lim — ^-/(y;x). (113) 

Since this lower bound holds for any finite K' we can tighten it if we take the supremum over K'; 
this leads to 

Coo > sup lim -rL;I(y; Si). (1 14) 



April 10, 2008 



DRAFT 



63 



We next decompose the mutual information in ( |114| ) as the difference of KL divergences [81, 
Eq. (10)] 



-^/(y;x) 



(115) 



and evaluate the two terms separately. As Qy\z = CN"(0,I K > N + (xx ff ) ©RjJ, we can use 
the closed-form expression for the KL divergence of two JPG random vectors a ~ £/V(0, R a ) 
and b ~ CAT (0, 1) [14, Eq. (59)] 



D(CjV(0,R a ) || CAf (0,1)) = tr(R a - I) - logdet(R 3 



(116) 



Thus, the expected divergence in ( 1 15 ) can be expressed as 



E* [D (Qy 1 5 1 1 i s =0 )]=^E, [tr { («*) 0%}] 



- ^E^logdet^ + (^)0%)] 

iV-l 



(117) 

The last step follows because each nonzero vector is transmitted with probability 1 / (N/3) in the 
OO-WHK signaling scheme of Definition 16 and because the diagonal entries of R^ are normalized 



to 1 . We next exploit the structure of the signaling scheme, and the fact that the correlation matrix R^ 



is two-level Toeplitz, to simplify the determinant in the second term on the RHS of ( 1 17 ) as 

det(l K , N + (x« 0R H ) = det(l K > +/3PTR E [0]) (118) 

for all i, and where h[0] = [h[0, 0] h[l, 0] ■ ■ ■ h[K' - 1, 0]] T and R £ [0] = E[h[0]h H [0]]. We next 
substitute our intermediate results ( |115[ ), <\l 17[ ), and ( |118[ ) into the lower bound ( |114[ ) to obtain 



C*, >P~™f\ ^logdet(V +/5PTR fi [0]) + Jim _i_D(Q~||Q~ |X=0 ) }>. (119) 



In Appendix [j] it is shown that 



Jim A-D{Q y \\Q^ =Q ) =0. 
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To conclude, we simplify the second term on the RHS of <|1 19|) as 



1/2 . ^ 

inf^;logdet(l^+^PT%[0]) ( = ] ^ | log M + /3P g e ( 



6 + k 
T 



(10 



(J 1 

~ 



1/2 



\og{l +pPq w {v))dv. 



Here, in (a) we used Lemma 11 in Appendix [b] for the case when x is a K' -dimensional vector 
with all-1 entries, as well as 



k=— oo 



c(6)= J2 Mk,0]e~ j27Tke 

30 

OO 



fc= — OO 



k=— oo 



9-k 



T 



Finally, (b) holds because qu(v) is compactly supported on [— z/ , z/o]> an d ^ < l/(2fo). A change 
of variables v = 9/T yields the final result. 



Appendix I 



Lemma 17: Consider a channel with input-output relation 15 112) 



y = x h + w 



where the K iV-dimensional vectors y, x, h, and w are defined as in ( 111 ), i.e., stacking is first 
along time and then along frequency. Then, 

lim ± 7 D(Q y \\Q yl x =o )=0 (120) 

TV — >oo /v 

for the OO-WHK scheme in Definition 16 of Appendix [H] 

Proof: Let q y and q y \ x be the probability density functions (PDFs) associated with the proba- 
bility distributions Q y and Q y \ x , respectively. By definition of the KL divergence, 

Qy(y) 



D(Q y \\Q y{x=0 )=E y 



log 



9y|x=o(y)/_ 



(121) 



To keep the notation compact, in this appendix we drop the tilde notation [cf. JTT21] 



April 10, 2008 



DRAFT 



65 



For the OO-WHK scheme in Definition [T6j the PDF q y of the output vector can be written as 



/ l\ 1 ^ 



I X=Xi 



(122) 



The output random vector y has the same distribution as the noise vector w ~ CJ\f(0, 1 



k'nj 



when x = 0. Hence, | x =o = <7w To express ( 121 ) in a more convenient form, we define the 
following RV: 

N-l r , . X . f s 

w 



S N (w) = 



i=0 



1 _ M , ! gy|» 

0; /? gw(w) 



Si(w) 



We can express the KL divergence ( 121 ) as a function of the RV Sjv(w) as follows: 

<? y (y) 
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g y |x=o(y)y. 
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To prove Lemma 17 it suffices to show that the sequence of RVs {Vjv(w)} where 

Sn(w) , ( S N (w)' 



V N (w) 



log 



N °\ N 

converges to in mean as N — ► oo. To prove this result, we first show that {V/v(w)} converges 
to w.p. 1. Then we argue that the sequence forms a backward submartingale [82, p. 474 and p. 499] 
so that it converges to also in mean by the submartingale convergence theorem [83, Sec. 32.IV]. 



A. Convergence w.p.l 

The RVs Sj(w) are i.i.d. for i = 0, 1, . . . , N — 1. As this result is rather tedious to prove, we 



postpone its proof to Appendix I-C It is instead straightforward to prove that these RVs have mean 1. 
In fact, 



E w ta(w)] 
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g w (w)dw = 1. 
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It then follows from the strong law of large numbers that 

Sat(w) 



lim 



E w [s (w)] = 1 w.p.l 



iV-»oc N 

and, as the function r(x) = x \ogx is continuous, we have by [78, Th. 4.6] that 



lim Viv(w) 

AT— »oo 



lim r 

N—>oo 



Sn(w] 
N 



r ( lim 



S N (w) 



N^oo N 



w.p.l. 



B. Convergence in Mean 

As theRVs {si(w)} arei.i.d., the sequence {V/v(w)} and the decreasing sequence of cr-fields {Gn}, 
where Gn is the smallest a-field with respect to which the random variables {^(w) , Sjy+i (w) , ■ ■ ■ } 
are measurable, form a backward (or reverse) submartingale [82, p. 474 and p. 499]. This result 
follows because the pair ({Sn(w)/N} , {Gn}) is a backward martingale [82, p. 499], and because 
the function r(x) = x log x is convex. 

Since {V/v(w)} 1S a backward submartingale and {V/v(w)} converges to w.p.l as N — ► oo, 
{ Vat(w) } converges to as N — »• oo also in mean. This result follows by the backward submartingale 
convergence theorem below: 

Theorem 18 (see [83, Sec. 32. IV]): Let {X N } be a backward submartingale with respect to a 
decreasing sequence of a-fields {Gn}- Then {Xjv} converges w.p.l and in mean to X < oo if and 
only ifE[|Xi|] < oo and lim A r-+ OC) EfX^v] > — oo. 



To conclude the proof, we need to show that the technical conditions in Theorem 18 hold, i.e., 
that the sequence {Vn(w)} satisfies 



lim E w [V N (w)} > -oo 



(123) 



and 



E w [|T4(w)|] =E w [|s (w) logs (w)|] < oo. 
The first inequality follows from Jensen's inequality and because the Sj(w) have mean 1: 



(124) 
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The second inequality is proven in Appendix I-D 
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C. The Random Variables Sj(w) are i.i.d. 
To show that the RVs 



Si w 



1 - 



1 9y|: 



w 



/?/ /3 g w (w) 



are i.i.d., we first simplify g y | x=Xi as 



exp 
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(125) 
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7T K ' N det{A) 



where we set 



A = \ K , +pPTR h [0] 



(126) 



and where, as usual, w = [w T [0] w T [l] • • • w T [iV — 1]] T ■ To obtain ( 125 ) we apply the determi 



nant equality ( | 1 1 8 1 to simplify the denominator. For the numerator, we used that, for the OO-WHK 
the matrix I K > N + (x^xf^) R h is block diagonal, with N — 1 blocks equal to I K > 



in Definition 



16 



and one block equal to A = I K > + /5PTR h [0]. Hence, its inverse is also block diagonal, with N — 1 



blocks equal to 1 K > and one block equal to A \ Next, we use ( 125 ) to express the ratio q y \ x = Xi /? 
as 

<?y|x= Xl (w) _ 1 



expplwflf-w^A-^i]] . 



(127) 



g w (w) det(A) 

This last result implies that each Sj(w) depends only on the random noise vector w[i). As the noise 
is white, the random vectors w[i) are i.i.d. for all i. Hence, the RVs Sj(w) are i.i.d. as well. 



D. Proof of Inequality ( 124 ) 

As x log a; > — e _1 for all x > 0, we have that \x logx\ < x log x + 2e _1 ; hence, 

E w [|s (w) logs (w)|] < E w [s (w) logso(w)] + 2c- 1 . 
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We next use the convexity of x log x and that (3 > 1 to upper-bound s (w) log s (w) as 
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(128) 



where (a) follows from the definition of convexity, and in (b) we used that (3 > 1. If we take the 

q w (w)dw 



expectation on both sides of ( 128 ), we get 
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n K ' det(A) 

+ wlOj^wfO] + log(det(A))] dw[0) 



< oo. 



where (a) follows because g y | x=xo (w) > for all w; in (b) we used ( |125[ ) and ( |127[ ), while to 
obtain (c) we first integrated over {w^]},^ 1 and then we used the triangle inequality and that A is 



positive definite with eigenvalues larger or equal to 1 [see ( |126[ )]. The last inequality holds because A 
satisfies the trace constraint tr(A) = K (1 + (3PT), which implies that its eigenvalues are bounded. 

Appendix J 
Proof of Theorem [8] 



We use the decomposition of mutual information as a difference of KL divergences (115), and 
upper-bound sup 5 J(y; x) in ([56]) because the KL divergence is nonnegative: 

sup/(y;x) = sup{E x [£>(Q y | x ||Q y | x=0 )] - D(Q y \\Q y , x=0 ) } (129) 
s s 
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< SUpE x [£>(<2 y |x||Qy|x=o)] 



(130) 



As in the proof of Theorem [T| we rewrite the supremum over the distributions in the set S as a 
double supremum over a E [0, 1] and over the restricted set of input distributions «S| a that satisfy 
the average power constraint E[||x|| 2 ] = otKPT and the peak constraint (23). Then, we use the 



closed-form expression for the KL divergence of two multivariate Gaussian vectors ( |116[ ) and we 
follow the same arguments as in the proof of Theorem [T] 

— sup E x [D (Q y | x 1 1 Q y | x=0 ) ] 

1 



sup sup { aP - -J- E [log det (l KN + (xx H ) R h )l } 

0<a<l S|„ L KI ) 



sup j aP - inf E [log det (l KN + (xx ff ) R h )] 

<a<l \_ S\a J\ -L 

logdet(l^7V + (xx ff ) 0R h ) 



< sup 

0<a<l 



aP — aPinf 



P-Pinf 



logdet(l^+ (xx ff )0R h ) 



(131) 



The infimum in ( 13 1 ) has the same structure as the infimum (33 1 in the proof of Theorem [T] Hence, 
as Rh is positive semidefinite, we can conclude that the infimum ( | 1 3 1 ) is achieved on the boundary 
of the admissible set. Differently from the proof of Theorem [TJ however, the input signal is subject 
to a peak constraint in time so that the admissible set is defined by the two conditions 

\x[k,n]\ 2 G {0,/3PT} 

(132) 



N-l 

E 

n=0 



x[k,n]f < PPT, w.p.l. 



Hence, a necessary condition for a vector x to minimize log det (Irn + (xx fl ) 0R h ) /||x|| 2 isthe 
following: for any fixed k, x[k, n] may be different from only for at most one discrete frequency n. 
An example of such a vector is shown in Fig. [4] Even if the structure of the vector minimizing the 



second term on the RHS of ( | 1 3 1 [ > is known, the infimum ( |131[ ) does not seem to admit a closed-form 
expression. We can obtain, however, the following closed-form lower bound on the infimum if 



we replace the constraint ^1^=0 \ x [ky n \ T — PPT w.p. 1 in ( 132 ) with the less stringent constraint 
|x[/c,n]| 2 < (3PT w.p.l for all k and n. The infimum of log det (I^tv + (xx ff )0R h ) /||x|| 2 over 
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Fig. 4. The entries in the time-frequency plane of a vector x that satisfies the necessary condition to minimize 
logdet(l/fjv + (xx fl ) 0R h ) /||x|| 2 in (\32\ for the case K = 4. 



the vectors x that belong to the new admissible set can be bounded as in (34 ), after replacing /3PT/N 



by PPT and proceeding as in ( 17 ): 



inf log det (1 KN + (xx ff ) R h ) > — - 



1/2 1/2 

log(l + pPTc(9, (p)) d9d(p 
1/2-1/2 (133) 

logf l + ^C H (i/,r) j drdz/. 



To conclude the proof, we substitute (133) in (131 ) and obtain the desired upper bound (58). 
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